Exploratory and Big Data Analysis

Exploratory and Big Data Analysis

Exploratory Data Analysis (EDA) based on multivariate techniques has been extensively employed in many research fields, including social sciences, education, medicine, chemistry and related fields. EDA based on multivariate models, also known as Multivariate EDA (MEDA), relies on a set of visualizations that simplify the understanding of complex data. The interaction with these visualizations leads the analysis to uncover patterns in the data and gain knowledge from them. MEDA is kind of a synonym for data mining using multivariate models, but a plus of MEDA is that multivariate models are interpretable and can be used to interact with data in order to investigate the underlying phenomena of interest.

The MEDA tools are extremely powerful when applied to normal size data, as illustrated in hundreds of applications in a wide range of areas. However, they are hard to extend to the Big Data paradigm. The MEDA Toolbox in Matlab, a software initiative I lead, has been one of the first attempts to perform such extension. The MEDA Toolbox is open software available at the Github repository ( It combines clustering and kernel computations to extend MEDA visualization tools to unlimited numbers of observations or variables. This toolbox has been employed with success in several research and development projects, showing its potentiality to handle very complex data of disparate nature: medical, chemical, biological, computer traffic and security data, etc.

Together with Prof. Rasmus Bro, I organize the Ph.D. Course Multivariate Exploratory Data Analysis: Understanding by looking at data, which gathers students from very disparate areas (ICTs, astronomy, geology, biology, health, etc.) at the University of Granada, aiming to learn the basics of the MEDA approach.

Related references: