*ฅ^•ﻌ•^ฅ* ✨✨  HWisnu's blog  ✨✨ о ฅ^•ﻌ•^ฅ

n-body simulation in Python, C, Zig and Rust

Introduction

This article is one of the follow-up posts to the research paper "Energy Efficiency across Programming Languages". In this post, I aim to redo some of the benchmarks from the research paper with a twist:

The Goal

One of the primary goals of this article is to provide a counter-argument to the narrative that "Python is SLOW!!" by presenting a new perspective: "It's easy peasy lemon squeezy to speed up Python to near C-level speed".

Too much talking already! let's dive into code!

Python

Section 1: Defining the Body Class

This code defines a class Body with six attributes: x, y, z, vx, vy, and vz, which represent the position and velocity of a body in 3D space. The init method initializes these attributes to zero.

import random
import time
import sys

SOFTENING: float = 1e-9


class Body:
    x: float
    y: float
    z: float
    vx: float
    vy: float
    vz: float

    def __init__(self):
        self.x = 0.0
        self.y = 0.0
        self.z = 0.0
        self.vx = 0.0
        self.vy = 0.0
        self.vz = 0.0

Section 2: Defining the randomize_bodies Function

This function takes a list of Body objects and randomizes their positions and velocities.

def randomize_bodies(bodies: list[Body]) -> None:
    for body in bodies:
        body.x = 2.0 * random.random() - 1.0
        body.y = 2.0 * random.random() - 1.0
        body.z = 2.0 * random.random() - 1.0
        body.vx = 2.0 * random.random() - 1.0
        body.vy = 2.0 * random.random() - 1.0
        body.vz = 2.0 * random.random() - 1.0

Section 3: Defining the body_force Function

This function calculates the gravitational force between each pair of bodies and updates their velocities accordingly.

def body_force(bodies: list[Body], dt: float) -> None:
    for i in range(len(bodies)):
        Fx: float = 0.0
        Fy: float = 0.0
        Fz: float = 0.0
        for j in range(len(bodies)):
            dx: float = bodies[j].x - bodies[i].x
            dy: float = bodies[j].y - bodies[i].y
            dz: float = bodies[j].z - bodies[i].z
            distSqr: float = dx*dx + dy*dy + dz*dz + SOFTENING
            invDist: float = 1.0 / (distSqr ** 0.5)
            invDist3: float = invDist * invDist * invDist
            Fx += dx * invDist3
            Fy += dy * invDist3
            Fz += dz * invDist3
        bodies[i].vx += dt*Fx
        bodies[i].vy += dt*Fy
        bodies[i].vz += dt*Fz

Section 4: the Main Function

This function is the entry point of the program. It creates a list of Body objects, randomizes their positions and velocities, and then simulates the gravitational interactions between them for a specified number of iterations.

At the end we call the main function.

def main() -> None:
    n_bodies: int = 30000
    if len(sys.argv) > 1:
        n_bodies = int(sys.argv[1])

    dt: float = 0.01  # time step
    n_iters: int = 10  # simulation iterations

    bodies: list[Body] = [Body() for _ in range(n_bodies)]

    randomize_bodies(bodies)

    total_time: float = 0.0

    for iter in range(1, n_iters+1):
        start_time: float = time.time()

        body_force(bodies, dt)

        for body in bodies:
            body.x += body.vx*dt
            body.y += body.vy*dt
            body.z += body.vz*dt

        end_time: float = time.time()
        t_elapsed: float = end_time - start_time
        if iter > 1:
            total_time += t_elapsed

        print(f"Iteration {iter}: {t_elapsed:.3f} seconds")

    avg_time: float = total_time / (n_iters-1)

    print(f"""Average rate for iterations 2 through {n_iters}: {1.0 / avg_time:.3f} steps per second.""")

    interaction = 1e-9 * n_bodies * n_bodies / avg_time
    print(f"""{n_bodies} Bodies: average {interaction:.3f} Billion Interactions / second""")


main()

Initially, I included code examples for all four languages (C, Zig, Rust, and Python) in this post but it ended up making the post too long. The main differences lie in minor variations between each language's implementation.

Benchmark Results

Now let's discuss the benchmark results, which is the most interesting part:

SINGLE-THREADED

Language steps/sec Billion inter/sec Notes
Python 0.00383 0.00345 Baseline performance
PyCodon 0.235 0.211 60.3x ; 60.2x speedup
C 0.268 0.241 68.9x ; 68.9x speedup
Rust 0.248 0.224 63.7x ; 64x speedup
Zig 0.421 0.379 109x ; 109x speedup

Conclusion

The results show that Python, when optimized with Codon, can achieve performance close to C-level speed. This is quite impressive, and I've found that Codon is comparable to Cython and Numba in terms of efficiency.

If you're new to Codon, you can read more about it here.

MULTI-THREADED

Language steps/sec Billion inter/sec Notes
Python 0.00383 0.00345 Baseline performance
C - Omp 2.017 1.815 525x ; 525x speedup
C - pthread 3.007 2.706 784x ; 784x speedup
Rust -std 2.256 2.031 588x ; 588x speedup
Rust -Rayon 2.012 1.811 524x ; 524x speedup
Zig 3.146 2.831 820x ; 820x speedup

Update

Update-1: the C multithreading implementation here was using OMP (higher level API). In a followup post here I redid C multithreading using pthread which is lower level and it's +50% more performant than OMP !!

Update-2: Rewrote all Rust multithreading implementation using both the std::thread and Rayon. Now Rust's numbers makes sense and decent compared to the other languages. Here I wrote a follow-up article to properly discuss the optimization and the differences between std::thread vs Rayon implementation.

The multi-threaded results takes the cake! And the table provide a picture one of the reasons why the news on Python going to release the GIL (Global Interpreter Lock) excites a lot of people!

Unlocking the GIL will open up Python to native multi threading capabilities...so lets Release the Kraken, I say!!


Rust Performance

Note: While Rust is indeed "blazingly fast" compared to higher-level languages, it often trades blows with other low-level languages and may not always be the most performant. As we can see from the table, even using std::thread (supposed to be the low-level implementation of multithreading in Rust), it only manages to reach C's OMP performance (a higher-level API).

It's essential to remember that Rust's emphasis on safety and security comes with a cost. The additional checks and requirements can impact performance, as seen in the benchmark scenario. But also sometimes in different scenarios Rust is shown to be faster (note the use of unsafe blocks) vs C/C++ or Zig, so yeah nothing is set in stone.

#benchmark #c #codon #high level #low level #multi threading #python #rust #single threading #threads #zig