n-body simulation in Python, C, Zig and Rust
Introduction
This article is one of the follow-up posts to the research paper "Energy Efficiency across Programming Languages". In this post, I aim to redo some of the benchmarks from the research paper with a twist:
- The benchmarks will be implemented in multiple languages: C, Zig, Rust, and Python.
- I will demonstrate how easy it is to speed up Python via Codon.
- Written with minimal use of external lib. The standard library should be enough to reproduce the tests.
- The tests will be done both in single and multi-threaded.
The Goal
One of the primary goals of this article is to provide a counter-argument to the narrative that "Python is SLOW!!" by presenting a new perspective: "It's easy peasy lemon squeezy to speed up Python to near C-level speed".
Too much talking already! let's dive into code!
Python
Section 1: Defining the Body Class
This code defines a class Body with six attributes: x, y, z, vx, vy, and vz, which represent the position and velocity of a body in 3D space. The init method initializes these attributes to zero.
import random
import time
import sys
SOFTENING: float = 1e-9
class Body:
x: float
y: float
z: float
vx: float
vy: float
vz: float
def __init__(self):
self.x = 0.0
self.y = 0.0
self.z = 0.0
self.vx = 0.0
self.vy = 0.0
self.vz = 0.0
Section 2: Defining the randomize_bodies Function
This function takes a list of Body objects and randomizes their positions and velocities.
def randomize_bodies(bodies: list[Body]) -> None:
for body in bodies:
body.x = 2.0 * random.random() - 1.0
body.y = 2.0 * random.random() - 1.0
body.z = 2.0 * random.random() - 1.0
body.vx = 2.0 * random.random() - 1.0
body.vy = 2.0 * random.random() - 1.0
body.vz = 2.0 * random.random() - 1.0
Section 3: Defining the body_force Function
This function calculates the gravitational force between each pair of bodies and updates their velocities accordingly.
def body_force(bodies: list[Body], dt: float) -> None:
for i in range(len(bodies)):
Fx: float = 0.0
Fy: float = 0.0
Fz: float = 0.0
for j in range(len(bodies)):
dx: float = bodies[j].x - bodies[i].x
dy: float = bodies[j].y - bodies[i].y
dz: float = bodies[j].z - bodies[i].z
distSqr: float = dx*dx + dy*dy + dz*dz + SOFTENING
invDist: float = 1.0 / (distSqr ** 0.5)
invDist3: float = invDist * invDist * invDist
Fx += dx * invDist3
Fy += dy * invDist3
Fz += dz * invDist3
bodies[i].vx += dt*Fx
bodies[i].vy += dt*Fy
bodies[i].vz += dt*Fz
Section 4: the Main Function
This function is the entry point of the program. It creates a list of Body objects, randomizes their positions and velocities, and then simulates the gravitational interactions between them for a specified number of iterations.
At the end we call the main function.
def main() -> None:
n_bodies: int = 30000
if len(sys.argv) > 1:
n_bodies = int(sys.argv[1])
dt: float = 0.01 # time step
n_iters: int = 10 # simulation iterations
bodies: list[Body] = [Body() for _ in range(n_bodies)]
randomize_bodies(bodies)
total_time: float = 0.0
for iter in range(1, n_iters+1):
start_time: float = time.time()
body_force(bodies, dt)
for body in bodies:
body.x += body.vx*dt
body.y += body.vy*dt
body.z += body.vz*dt
end_time: float = time.time()
t_elapsed: float = end_time - start_time
if iter > 1:
total_time += t_elapsed
print(f"Iteration {iter}: {t_elapsed:.3f} seconds")
avg_time: float = total_time / (n_iters-1)
print(f"""Average rate for iterations 2 through {n_iters}: {1.0 / avg_time:.3f} steps per second.""")
interaction = 1e-9 * n_bodies * n_bodies / avg_time
print(f"""{n_bodies} Bodies: average {interaction:.3f} Billion Interactions / second""")
main()
Initially, I included code examples for all four languages (C, Zig, Rust, and Python) in this post but it ended up making the post too long. The main differences lie in minor variations between each language's implementation.
Benchmark Results
Now let's discuss the benchmark results, which is the most interesting part:
SINGLE-THREADED
Language | steps/sec | Billion inter/sec | Notes |
---|---|---|---|
Python | 0.00383 | 0.00345 | Baseline performance |
PyCodon | 0.235 | 0.211 | 60.3x ; 60.2x speedup |
C | 0.268 | 0.241 | 68.9x ; 68.9x speedup |
Rust | 0.248 | 0.224 | 63.7x ; 64x speedup |
Zig | 0.421 | 0.379 | 109x ; 109x speedup |
Conclusion
The results show that Python, when optimized with Codon, can achieve performance close to C-level speed. This is quite impressive, and I've found that Codon is comparable to Cython and Numba in terms of efficiency.
If you're new to Codon, you can read more about it here.
MULTI-THREADED
Language | steps/sec | Billion inter/sec | Notes |
---|---|---|---|
Python | 0.00383 | 0.00345 | Baseline performance |
C - Omp | 2.017 | 1.815 | 525x ; 525x speedup |
C - pthread | 3.007 | 2.706 | 784x ; 784x speedup |
Rust -std | 2.256 | 2.031 | 588x ; 588x speedup |
Rust -Rayon | 2.012 | 1.811 | 524x ; 524x speedup |
Zig | 3.146 | 2.831 | 820x ; 820x speedup |
Update
Update-1: the C multithreading implementation here was using OMP (higher level API). In a followup post here I redid C multithreading using pthread which is lower level and it's +50% more performant than OMP !!
Update-2: Rewrote all Rust multithreading implementation using both the std::thread and Rayon. Now Rust's numbers makes sense and decent compared to the other languages. Here I wrote a follow-up article to properly discuss the optimization and the differences between std::thread vs Rayon implementation.
The multi-threaded results takes the cake! And the table provide a picture one of the reasons why the news on Python going to release the GIL (Global Interpreter Lock) excites a lot of people!
Unlocking the GIL will open up Python to native multi threading capabilities...so lets Release the Kraken, I say!!
Rust Performance
Note: While Rust is indeed "blazingly fast" compared to higher-level languages, it often trades blows with other low-level languages and may not always be the most performant. As we can see from the table, even using std::thread (supposed to be the low-level implementation of multithreading in Rust), it only manages to reach C's OMP performance (a higher-level API).