PyData Tel Aviv 2024

The TL;DR of EDA
11-04, 10:45–11:15 (Asia/Jerusalem), Green Track

If you're drowning in data but short on time, this talk is for you. We'll explore EDA methods for the 'lazy engineer,' showcasing how to accelerate insights by automating your data exploration. From leveraging automated reporting libraries like Ydata-Profiler to using ML Clustering algorithms - going from raw data to distinct clusters, and finally enhancing insights with the power of LLMs. Join me for a practical, end-to-end guide to optimizing EDA, agnostic to data types, and boosting productivity through smart automation.


As the amount of data grows at an unprecedented rate, engineers face a critical challenge of efficiently processing and analyzing it. However, manual and time-consuming methods of familiarizing with data are still commonly used, despite the urgent need for shortcuts. In this talk, we review several automated EDA methods leveraging several Python libraries to research data, model it in distinct clusters and reduce time-to-insights.
Our solution can be applied in various use cases, including data pre-processing for labeling, segmentation problems, and more.
This talk is an essential guide for lazy (and/or busy) engineers who want to streamline the process of data exploration and reduce the workload

Head of Data and Data scientists at Parazero, IoT and signal processing expert.
Community lead and Mentor in WiDS.
MSc mechanical engineering, researching fluid dynamic models.