< Methods

Exploratory data analysis (ML)

field

Why?

Find something interesting in the data, check understanding of the domain or problem space, generate new questions based on the data.

How?

Use descriptive statistics and data visualisation techniques to summarize the data along different axes (features or columns). There is not a pre-defined sequence of actions, instead you determine the path you take through the data (what next to explore?) based on the outcome of the previous step. Stop the exploration once confident that the data has no more secrecies for you.

Ingredients

  • A raw dataset (can also be images or documents)
  • A domain expert to answers questions about the data
  • Creativity to come up with useful data visualisations and exploration paths
  • Tools to explore and visualize data (literate programming tools are useful to include explanations in between code)

In practice

Exploratory data analysis (EDA) is a necessary step at the beginning of each data analysis or machine learning project. Next to exploring user requirements you need to explore the data to get yourself familiar with the domain and the problem space.

Phase(s) of use

In the following project phase(s) exploratory data analysis (ml) can be used:

  • Machine learning