FireDucks : Pandas but 100x faster
Introduction
My main background is a hedge fund professional, so I deal with finance data all the time and so far the Pandas library has been an indispensable tool in my workflow and my most used Python library.
Then came along Polars (written in Rust, btw!) which shook the ground of Python ecosystem due to its speed and efficiency, you can check some of Polars benchmark here.
I have around +/- 30 thousand lines of Pandas code, so you can understand why I've been hesitant to rewrite them to Polars, despite my enthusiasm for speed and optimization. The sheer scale of the task has led to repeated delays, as I weigh the potential benefits of a faster and more efficient library against the significant effort required to refactor my existing code.
There has always been this thought in the back of my mind:
Pandas is written in C and Cython, which means the main engine is King C...there got to be a way to optimize Pandas and leverage the C engine!
FireDucks
Here comes FireDucks, the answer to my prayer: a speed demon Pandas library!. It was launched on October 2023 by a team of programmers from NEC Corporation which have 30+ years of experience developing supercomputers, read the announcement here.
Quick check the benchmark page here! I'll let the numbers speak by themselves.
This is the craziest bench, FireDucks even beat DuckDB! Also check Pandas & Polars ranks.
Average 50x faster than Pandas.
It's even faster than Polars!
Benchmark
Alrighty those bench numbers from FireDucks looks amazing, but a good rule of thumb is never take numbers for granted...don't trust, verify! Hence I'm making my own set of benchmarks on my machine.
#1 Create 150 MB dummy data
- Result:
Note: fd = fireducks, pd = pandas.
#2 Read the 150 MB dummy data
- Result:
Note: fd = fireducks, pd = pandas.
#3 Calculate Mean
#4 Calculate Sum
Yes the last two benchmark numbers are 130x and 200x faster than Pandas...are you not amused with these performance impact?! So yeah, the title of this post is not a clickbait, it's real. Another key point I need to highlight, the most important one:
Using FireDucks requires ZERO Pandas code change
you can just plug FireDucks into your existing Pandas code and expect massive speed improvements..impressive indeed!
Conclusion
I'm lost for words..frankly! What else would Pandas users want?
- Massive speedup: check.
- 100% compatibility with existing Pandas code: check.
- Zero code change: check.
- Effortless / super easy to use: check.
A note for those group of people bashing Python for being slow...yes pure Python is super slow I agree. But it has been proven time and again it can be optimized and once it's been properly optimized (FireDucks, Codon, Cython, etc) it can be speedy as well since Python backend uses C engine!
Be smart folks! Noone sane would use "pure Python" for serious workload...leverage the vast ecosystem!