PyData Tel Aviv 2024

Processing Biggish Data with DuckDB and Python
11-04, 13:45–14:15 (Asia/Jerusalem), Blue Track

In this session, I will define the term "biggish" data. After understanding the definition of this new term and why it is so important, I will discuss and show how utilizing DuckDB in Python creates a whole new set of possibilities when working with data. There are many ways to use DuckDB in Python and I want to share some of those with you.


Did you ever wonder: "What if there was a tool to handle biggish data?" You do not have terabytes of data, but using just Python with Pandas didn't quite work, and now you are using tools like Apache Spark and Trino with all their complexity. And what the hell is biggish data anyway?
It is time to declare: Not all Big Data is created equal, and therefore, not all Big Data needs the same tools to process and query.
First I want to explain what biggish data is. Then, I will show you how incorporating DuckDB into your Python project, you can relinquish all those complex data analysis and processing tools when dealing with biggish data. Let's delve together into the many possibilities DuckDB offers using Python when processing data, and I hope you will see how this opens a whole new set of options for you.

Yoav Nordmann is a Backend & Data Architect and Tech Lead with over 20 years of experience. At Tikal he holds the position of a Group Leader mentoring fellow workers. He is passionate about new and emerging technologies, knowledge sharing and a fierce advocate for open source. Being in the industry for so long gives him a sense of perspective on different languages, architectures, and hypes.