Levelized cost of resources in benchmarks
Situation update
Howdy everyone! It's been a while since my last blog post since I have been busy dealing with the volatile equity markets due to the reciprocal tariff war launched by US to the whole world. Equity markets were cratering like crazy and it was one of the easiest 50-75% gain from the bottom market in 1 month you'll ever encountered.
Yeah I was so focused wearing my Finance-hat I totally forgot to wear my Writer-hat back. Since then I've scaled out (sold) most of the positions I bought near the bottom of the market and now I'm waiting for the next down-cycle which might take a while.
This waiting time means I'll have spare time to write, yayy!!
Introduction
Alright in this post I want to write about the importance of normalizing the resource cost when doing benchmarks. To make it easier to understand, imagine you're ride-testing two cars: first one with 2,500cc engine and it took 10 seconds from 0 to 150 kph.
The second car took much less time, 8 seconds from 0-150 kph, but it requires 7,500cc engine to achieve that. So in order to be 20% faster, it needs 200% increase in engine power.
If we don't normalized the resource cost, it would look as if the Second car wins hands down. But post normalizing, it becomes clear that it is far too costly for us to triple the resources in order to achieve 20% faster performance.
The same concept applies when analyzing levelized cost of electricity of various energy generation methods : solar photo-voltaic, wind farms, hydro, fossil fuels (coal, gas), nuclear, etc. You would be surprised seeing the numbers once we put LCOE in place : nuclear powerplant is the cheapest!
grep and ripgrep
The tools I'm using to demonstrate the importance of levelizing / normalizing resource cost are grep and ripgrep: both are command line tools to search files for lines containing a pattern match.
I'm inspired by Primeagen livestream where he briefly mentioned something along the lines of: "ripgrep is faster than grep, but also ripgrep uses a lot more cpu". I was intrigued by this and hence I decided to investigate deeper.
Below are some screenshots of the benchmark I did, pay attention to the CPU used:
Singlethreaded (ST) grep and Multithreaded (MT) ripgrep
Both singlethreaded: ripgrep require the use of
-j1
flagBoth singlethreaded, for the ripgrep using
-uuu
flag would yield the same 963 matches.
Okay you get the idea: in a levelized / normalized resource usage (set ripgrep thread to 1), in the test I did shows a general picture: yeah ripgrep is fast at default settings, however it also uses a lot more CPU resources. When we normalized the resource usage, it loses it's speed advantage.
Ripgrep github benchmark
Now let's visit Ripgrep github, there are some interesting benchmark numbers we can analyze:
Check this github page to see the raw commands used in ripgrep's benchmark.
Did you notice something? None of the commands used -j1
flag which is to make ripgrep singlethreaded. Hence the benchmark is not normalized at all, this means grep uses 1 thread, while ripgrep uses multiple threads. This is unfair.
Last screenshot, below I demonstrated using ripgrep both single and multi threaded.
It's a bit messy, let me write the relevant output:
- ST grep : 99% cpu 0.170 total
- ST grep : 99% cpu 0.172 total
- ST ripgrep: 93% cpu 0.217 total
- MT ripgrep: 789% cpu 0.051 total
In ST (single threaded), ripgrep is slower (0.217 vs 0.172). In MT (multi threaded) ripgrep is blazzingly fast: 2.4x faster than single threaded grep, however it also uses almost 8 threads in order to achieve that performance vs only 1 thread for grep.
Is that performance worth it? I'll leave the answer to you. Looks like what Primeagen said was true after all!
Conclusion
This post is not an attack to ripgrep or burntsushi, not at all, the purpose of writing the article is to provide insight of how important to levelized / normalized the resource usage of the tools or programs we are using or benchmarking.
I encourage those writing benchmarks to start providing the resource usage data, so we the readers can determine the benchmark is normalized or not. Are you okay with having 2x faster performance at the cost of 8x more resource hungry app?
Someone would argue, "having those threads idle / unused is not good either, let's use those idle threads!" Yes I agree completely with that argument, however if you want to run grep / ripgrep do you close your other apps first (e.g: running Unreal engine, Godot, VScode, Chrome with 20 tabs open 5 of which on Youtube, etc) and then run grep/ripgrep?
Or would you just keep all the apps open and run grep/ripgrep? Exactly my point! Unless all you do is running benchmarks all day, it's very important to measure levelized / normalized resource usage of an app.
For example in my daily usage, I keep 3 workspaces open:
- Workspace #1: standard office apps, browsers open with 20-30 tabs to do some market research, at least one of them running Bloomberg live market non-stop. Sometimes there are a couple of Youtube videos too.
- Workspace #2: trading app (quite heavy coz I'm running an algo trade).
- Workspace #3: programming (mostly just terminal and Neovim, but sometimes I run 5-6 Python instances either to scrape data or do some numbers crunching).
Are you saying I need to close all of these apps and run ripgrep if I want to do pattern matching on files / directories? No thank you!