*ฅ^•ﻌ•^ฅ* ✨✨  HWisnu's blog  ✨✨ о ฅ^•ﻌ•^ฅ

Array of Structs and Struct of Arrays

Introduction

Data structures play a crucial role in determining the performance and efficiency of algorithms. The Array of Structs (AoS) and Struct of Arrays (SoA) are two popular data structures used to represent complex data. Although they can store similar data, they differ in memory layout, access patterns, and performance.

Here's an excellent video on Data Oriented Design by Andrew Kelley the creator of Zig programming language.

Array of Structs (AoS)

An AoS is a data structure composed of an array of structures, where each structure represents a single entity with multiple fields or members. The structures are stored contiguously in memory, one after the other.

typedef struct {
    int id;
    float x;
    float y;
} PointAOS;

PointAOS is a struct containing three fields: id (an integer), x (a float), and y (a float).

void initialize_AOS(PointAOS *points, int size)
{
    // Initialize the array
    for (int i=0; i<size; i++) {
        points[i].id = i;
        points[i].x = i * 1.0f;
        points[i].y = i * 2.0f;
    }
}

Initializes each element of the array with id set to the index, x set to i * 1.0f, and y set to i * 2.0f.

void process_AOS(PointAOS *points, int size)
{
    // Accessing elements
    for (int i=0; i<size; i++) {
        // printf("Point %d: (%.1f, %.1f)\n", points[i].id, points[i].x, points[i].y);
        float temp = points[i].x + points[i].y;

        if (i == size - 1) {
            printf("%.1f ", temp);
        }
    }
    printf("\nArray of Structs\n");
}

Iterates through the array, calculates the sum of x and y for each point, and prints the result for the last element.

#define ARRAY_SIZE 10e6

int main()
{
    // Create an array of structs
    PointAOS points[ARRAY_SIZE];

    initialize_AOS(points, ARRAY_SIZE);
    process_AOS(points, ARRAY_SIZE);

    return 0;
}

An array of 10 PointAOS structs is created on the stack, then initialized and processed using the respective functions.

Key highlights:

Struct of Arrays (SoA)

An SoA is a data structure consisting of a single structure that contains multiple arrays, each representing a single field or member of an entity. These arrays are stored contiguously in memory, but separately from one another.

typedef struct {
    int* id;
    float* x;
    float* y;
} PointSOA;

PointSOA is a struct containing three pointers: id (a pointer to an array of integers), x (a pointer to an array of floats), and y (a pointer to an array of floats).

void initialize_SOA(PointSOA *points, int size)
{
    // Initialize the struct
    points->id = (int*) malloc(sizeof(int) * size);
    points->x = (float*) malloc(sizeof(float) * size);
    points->y = (float*) malloc(sizeof(float) * size);

    // Initialize the arrays
    for (int i=0; i<size; i++) {
        points->id[i] = i;
        points->x[i] = i * 1.5f;
        points->y[i] = i * 2.5f;
    }
}

Allocates memory for each array using malloc then initializes each element of the arrays with id set to the index, x set to i * 1.5f, and y set to i * 2.5f.

void process_SOA(PointSOA *points, int size)
{
    // Accessing elements
    for (int i=0; i<size; i++) {
        // printf("Point %d: (%.1f, %.1f)\n", points->id[i], points->x[i], points->y[i]);
        float temp = points->x[i] + points->y[i];

        if (i == size - 1) {
            printf("%.1f ", temp);
        }
    }
    printf("\nStruct of Arrays\n");
}

iterates through the arrays, calculates the sum of x and y for each point, and prints the result for the last element.

#define ARRAY_SIZE 10e6

int main()
{
    // Create a struct of arrays
    PointSOA points;

    initialize_SOA(&points, ARRAY_SIZE);
    process_SOA(&points, ARRAY_SIZE);

    free(points.id);
    free(points.x);
    free(points.y);

    return 0;
}

Memory allocated for the arrays is freed to prevent memory leaks.

Key highlights

Key differences:

AoS:

SoA:

Benchmark

One key of my blog is I love putting in some benchmark numbers! I used this hyperfine command: hyperfine --warmup 3 --min-runs 10 --max-runs 10 './bench_AOS_SOA' --export-markdown results_AoS-SoA.md

I did the command above 3x for each AoS and SoA and then took the average:

SoA is almost 30% faster based on this simple example, the difference will get significantly magnified with more complex scenario, sometimes SoA (optimized) is 10x faster compared to AoS as shown in this great article

Below is the visualization of how different the memory layout of AoS (top) vs SoA(bottom) and you'll quickly understand why SoA provides superior performance: memory_layout

Conclusion

Array of Structs (AoS) can be less cache-friendly because accessing multiple fields of a single element requires loading multiple cache lines. In contrast, Struct of Arrays (SoA) is more cache-friendly as it processes a single field across multiple elements, reducing cache line loads.

Below is the final form of the two data structures, I made them as header files so it looks different with the codes provided previously: aos-soa

The examples presented in this article are intentionally simplified to illustrate the fundamental differences between Array of Structs (AoS) and Struct of Arrays (SoA). However, in future articles as we delve into more advanced topics like SIMD (Single Instruction, Multiple Data) and vectorization, the advantages of SoA will become even more pronounced.


Additional references:

#array #c #low level #programming #struct