PyData Tel Aviv 2024

Identifying Repetitive Songs using LZ Compression
11-04, 13:45–14:15 (Asia/Jerusalem), Green Track

There are rumors that pop music has poorly-written lyrics. To tackle these claims, we wanted to automatically find the most repetitive songs published in Hebrew.
By combining scraping, data analysis, visualization and web development, all in Python, we successfully demonstrate what is the most repetitive song of all time, and also analyze pop-chart trends over the years, by genre and by artist.


To get the song lyrics we had to use some scraping, and we have plenty of tips to share on this.
We use our own implementation of LZ compression to measure how repetitive a song is, based on similar work that was done for English pop charts. We explain why this is a good idea, and what alternatives we considered.
We analyze some interesting trends in the radio pop charts using pandas and provide visualization using plotly, which we've also made available through an interactive web interface.

I'm an M.Sc graduate of Tel-Aviv University in Computer Science and I work for Google on products that aim to make phone calls more tolerable.
I've been excited about data and algorithms since I was young, and what I like even more is trying to get other to be as excited about them as I am.
I always try to pick projects that will interest the general public, or at least make them laugh.