Building a Virtual Machine, JVM-inspired — Heap (Part 3)

10 min readJan 6, 2025

In this article, we’ll implement a heap memory system to manage shared state in our virtual machine. We’ll explore how this implementation compares to the JVM’s heap and discuss the technical details of our memory management approach.

Check the previous article to catch up on how we implemented multi-threading.

Memory design

Our VM’s memory is organized into two main regions, thread local memory and shared heap.

Here’s a visual representation of our VM’s memory layout:

Thread-local memory

Each thread has its own memory space containing:

Local variables
Program counter
Thread metadata

Shared heap

In a multi-threaded environment, threads often need to share data. While each thread has its own local stack for method-specific variables, the heap provides a common area where shared data can live.

Implementation goals

We’re adding a new instruction called setshared that creates and modifies variables in the heap memory. The instruction takes two parameters:

Variable name (string)
Value to assign (integer)

Here’s an example program that demonstrates thread interaction through shared memory:

const char* program[] = {
    "setshared counter 0", // Line 0
    "thread 6",            // Line 1 - Start increment thread
    "thread 12",           // Line 2 - Start decrement thread
    "sleep 500",           // Line 3 - Main thread waits for threads to finish
    "print counter",       // Line 4 - Print final value
    "exit",                // Line 5

    // Attempt to change counter directly from Thread-0
    "sleep 1",              // Line 6
    "setshared counter 1",  // Line 7
    "print counter",        // Line 8
    "setshared counter 1",  // Line 9
    "print counter",        // Line 10
    "exit",                 // Line 11

    // Attempt to change counter directly from Thread-1
    "sleep 1",             // Line 12
    "setshared counter 2", // Line 13
    "print counter",       // Line 14
    "setshared counter 2", // Line 15
    "print counter",       // Line 16
    "exit",                // Line 17
    NULL
 };

Design decisions — Thread safety

Our implementation makes these design choices regarding thread safety:

We use a mutex when creating new variables in the heap. This ensures the heap’s structure remains consistent when multiple threads try to create variables simultaneously.
We deliberately don’t use mutexes for reading or updating existing heap variables. This mirrors the JVM’s approach where reading is inherently thread-safe, writing is not automatically protected and developers must use explicit synchronization (like Java’s synchronized keyword) when thread safety is required (we will implement synchronization in the next article).

Implementation

Heap structure

We are going to use the existing Variable structure to store the heap objects (for simplicity).

// tiny_vm.h
//...

// Variable storage
typedef struct {
    char* name;
    t_int value;
} Variable;

// VM state
struct VM {
    // ... other VM fields ...

    // Heap memory for shared variables
    Variable* heap;
    int heap_size;
    int heap_capacity;
    pthread_mutex_t heap_lock;  // Protect shared memory access
};

// Instruction types
typedef enum {
    // ...
    SETSHARED,  // setshared <variable> <value>
} InstructionType;
//...

Let’s implement the initialization and destruction of the heap.

// tiny_vm.c
// ...
VM* create_vm() {
    VM* vm = malloc(sizeof(VM));
    // ... other initialization code

    // Initialize heap
    vm->heap_capacity = 10;
    vm->heap_size = 0;
    vm->heap = malloc(sizeof(Variable) * vm->heap_capacity);
    pthread_mutex_init(&vm->heap_lock, NULL);
    return vm;
}
void destroy_vm(VM* vm) {
    // ...
    // Cleanup heap
    for (int i = 0; i < vm->heap_size; i++) {
        free(vm->heap[i].name);
    }
    free(vm->heap);
    pthread_mutex_destroy(&vm->heap_lock);
    // ...
}

Print with timestamp and flush

This is a minor deail, but let’s add print function that prints the current timestamp and flushes the standard output, so we can see the printf immediatelly.

// tiny_vm.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "tiny_vm.h"

void print(const char *format, ...) {
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    time_t t = ts.tv_sec;
    struct tm tm;
    localtime_r(&t, &tm);

    printf("[%04d-%02d-%02d %02d:%02d:%02d.%06ld] ",
        tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
        tm.tm_hour, tm.tm_min, tm.tm_sec, ts.tv_nsec / 1000);

    va_list args;
    va_start(args, format);
    vprintf(format, args);
    va_end(args);
    printf("\n");
    fflush(stdout);
}

Variable access and creation

The core of our heap implementation is the shared variable management. When a thread needs to access a variable, we follow this sequence:

First, check the thread’s local scope
If not found locally, look in the heap
If not found in heap, create new heap variable (with mutex protection)

// tiny_vm.c
// ...

// Get a shared variable from heap
Variable* get_shared_variable(ThreadContext* thread, const char* name) {

    // Look for existing variable
    for (int i = 0; i < thread->vm->heap_size; i++) {
        if (strcmp(thread->vm->heap[i].name, name) == 0) {
            print("[Thread %d] Found shared variable %s", thread->thread_id, name);
            return &thread->vm->heap[i];
        }
    }

    // Create a new variable if not found
    pthread_mutex_lock(&thread->vm->heap_lock);
    if (thread->vm->heap_size < thread->vm->heap_capacity) {
        Variable* var = &thread->vm->heap[thread->vm->heap_size++];
        var->name = strdup(name);
        var->value = 0;  // Initialize as int by default
        pthread_mutex_unlock(&thread->vm->heap_lock);
        print("[Thread %d] Created shared variable %s", thread->thread_id, name);
        return var;
    }

    return NULL;
}

// get a variable, no matter where it lives
t_int get_value(ThreadContext* thread, const char* name) {
    for (int i = 0; i < thread->local_scope->var_count; i++) {
        if (strcmp(thread->local_scope->variables[i].name, name) == 0) {
            return thread->local_scope->variables[i].value;
        }
    }
    Variable* shared = get_shared_variable(thread, name);
    if (shared) {
        return shared->value;
    }
    return -1;
}

Comparison with JVM’s Heap Implementation

We use a mutext when creating a shared variable on the heap, to ensure heap management consistency. We did it because we need to ensure consistency when modifying the heap management data, but this would lead to performance issues (contention) becuase all the running threads would have to wait for every object creation. JVM is solving this in a more sophisitaced way.

Thread-Local Allocation Buffers (TLABs) — TLABs are small, thread-specific areas in the heap where each thread allocates objects. Each thread allocates memory in its own TLAB, avoiding contention with other threads. If a thread’s TLAB is exhausted, the JVM allocates a new one for that thread. For example, Thread-1 allocates objects in its TLAB without any reference from Thread-2, ensuring fast, lock-free allocations.

Example of how are the objects in TLAB shared between threads:

Thread A creates an object in its TLAB, and then passes a reference of that object to Thread B.
Now, both Thread A and Thread B have a reference to the same object.
If both threads modify the object’s state concurrently, this can lead to a race condition, unless synchronization is applied. For instance, if the object’s state is updated without locks, one thread’s changes might overwrite the other’s.

Atomic operations and Compare-and-Swap (CAS) allow updates to share memory to happen atomically without interruption. When a thread allocates large objects or when TLABs are full, atomic operations, like CAS, ensure that memory updates are done safely, preventing race conditions.

Example of Compare-and-Swap in C++:

#include <iostream>
#include <atomic>
#include <thread>

std::atomic<int> sharedValue(0);  // Shared variable

void casTest(int expected, const int newValue) {
    // Atomically compare and swap
    bool result = sharedValue.compare_exchange_strong(expected, newValue);

    if (result) {
        std::cout << "Thread " << std::this_thread::get_id()
                  << " swapped value to " << newValue << "\n";
    } else {
        std::cout << "Thread " << std::this_thread::get_id()
                  << " failed to swap, current value is " << sharedValue.load() << "\n";
    }
}

void threadFunction() {
    int expected = 0;
    for (int i = 0; i < 5; ++i) {
        int newValue = expected + 1;
        casTest(expected, newValue);
        expected = sharedValue.load();
    }
}

int main() {
    // Start multiple threads
    std::vector<std::thread> threads;
    for (int i = 0; i < 3; ++i) {
        threads.push_back(std::thread(threadFunction));
    }

    for (auto& t : threads) {
        t.join();
    }

    return 0;
}

Memory barriers are low-level instructions that ensure memory operations (reads/write) happen in the correct order. They prevent out-of-order execution in multi-threaded environments, ensuring that chagnes to shared memory are visible to all threads in the proper sequence.

Example of memory barriers (fences) in C++:

#include "memory.h"
#include <iostream>
#include <atomic>
#include <thread>

std::atomic<int> x(0);
std::atomic<int> y(0);

void thread1() {
    x.store(1, std::memory_order_relaxed); // Store x
    std::atomic_thread_fence(std::memory_order_release);  // Release memory barrier
    y.store(1, std::memory_order_relaxed); // Store y
}

void thread2() {
    while (y.load(std::memory_order_relaxed) == 0);  // Wait until y is 1
    std::atomic_thread_fence(std::memory_order_acquire); // Acquire memory barrier
    if (x.load(std::memory_order_relaxed) == 1) {
        std::cout << "Thread 2 observed x as 1\n";
    }
}

int main() {
    std::thread t1(thread1);
    std::thread t2(thread2);

    t1.join();
    t2.join();

    return 0;
}

Instructions execution

Let’s add the parsing and execution logic for the new setshared instruction.

// tiny_vm.c
// ...

Instruction parse_instruction(const char* line) {
    Instruction instr;
    memset(&instr, 0, sizeof(Instruction));

    char cmd[32];
    sscanf(line, "%s", cmd);

    // ... other commands
    else if (strcmp(cmd, "setshared") == 0) {
        instr.type = SETSHARED;
        sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
    }

    return instr;
}

void execute_instruction(ThreadContext* thread, Instruction* instr) {
    LocalScope* local_scope = thread->local_scope;

    switch (instr->type) {
        // ... other cases
        case SETSHARED: {
            Variable* var = get_shared_variable(thread, instr->args[0]);
            if (var) {
                var->value = atoi(instr->args[1]);
                print("[Thread %d] Set-shared %s = %d", thread->thread_id, var->name, var->value);
            }
            break;
        }
    }
}

Simulating race condition

We are going to create three threads to test the variables on the heap and simulate a race condition.

The Thread-0 is the thread created before the Line 0 is executed, and is responsible to create another threads, then sleep for 500ms and then print the value of counter variable from the heap.

Both Thread-1 and Thread-2 will modify and print the counter in the heap twice, to increase the chances to see a situation where the Thread-1 should only work with counter=1, but instead it gets counter=2 becuase Thread-2 will modify the counter concurently.

// main.c
#include <stdio.h>
#include "tiny_vm.h"

int main() {
    // Our "Java" program with threads
    const char* program[] = {
        "setshared counter 0", // Line 0
        "thread 6",            // Line 1 - Start increment thread
        "thread 12",           // Line 2 - Start decrement thread
        "sleep 500",           // Line 3 - Main thread waits for threads to finish
        "print counter",       // Line 4 - Print final value
        "exit",                // Line 5

        // Attempt to change counter directly from Thread-0
        "sleep 1",              // Line 6
        "setshared counter 1",  // Line 7
        "print counter",        // Line 8
        "setshared counter 1",  // Line 9
        "print counter",        // Line 10
        "exit",                 // Line 11

        // Attempt to change counter directly from Thread-1
        "sleep 1",             // Line 12
        "setshared counter 2", // Line 13
        "print counter",       // Line 14
        "setshared counter 2", // Line 15
        "print counter",       // Line 16
        "exit",                // Line 17
        NULL
     };

    print("Starting TinyJVM with threads...");

    VM* vm = create_vm();
    start_vm(vm, program);
    destroy_vm(vm);

    print("TinyJVM finished.");
    return 0;
}

Output analysis

When we run the code, everything might run consistently, but only if we are lucky. So if we are not lucky, the threds will read and write the shared variable in the heap and we see the inconsistent state of counter variable as described above.

I have run the program couple of times to simulate the race condition:

./build/tiny_vm
[2024-12-30 11:41:11.000594439] Starting TinyJVM with threads...
[2024-12-30 11:41:11.000595671] [Thread 0] Thread started
[2024-12-30 11:41:11.000595686] [Thread 0] Created shared variable counter
[2024-12-30 11:41:11.000595690] [Thread 0] Set-shared counter = 0
[2024-12-30 11:41:11.000595721] [Thread 1] Thread started
[2024-12-30 11:41:11.000595735] [Thread 2] Thread started
[2024-12-30 11:41:11.000595738] [Thread 1] Found shared variable counter
[2024-12-30 11:41:11.000595742] [Thread 1] Set-shared counter = 1
[2024-12-30 11:41:11.000595745] [Thread 1] Found shared variable counter
[2024-12-30 11:41:11.000595746] [Thread 2] Found shared variable counter
[2024-12-30 11:41:11.000595747] [Thread 1] PRINT counter = 1
[2024-12-30 11:41:11.000595763] [Thread 1] Found shared variable counter
[2024-12-30 11:41:11.000595767] [Thread 1] Set-shared counter = 1
[2024-12-30 11:41:11.000595774] [Thread 1] Found shared variable counter
[2024-12-30 11:41:11.000595778] [Thread 1] PRINT counter = 1
[2024-12-30 11:41:11.000595783] [Thread 1] Thread finished

[2024-12-30 11:41:11.000595754] [Thread 2] Set-shared counter = 2
[2024-12-30 11:41:11.000595946] [Thread 2] Found shared variable counter
[2024-12-30 11:41:11.000595949] [Thread 2] PRINT counter = 1

[2024-12-30 11:41:11.000595951] [Thread 2] Found shared variable counter
[2024-12-30 11:41:11.000595953] [Thread 2] Set-shared counter = 2
[2024-12-30 11:41:11.000595955] [Thread 2] Found shared variable counter
[2024-12-30 11:41:11.000595957] [Thread 2] PRINT counter = 2
[2024-12-30 11:41:11.000595959] [Thread 2] Thread finished
[2024-12-30 11:41:11.000596363] [Thread 0] Found shared variable counter
[2024-12-30 11:41:11.000596366] [Thread 0] PRINT counter = 2
[2024-12-30 11:41:11.000596369] [Thread 0] Thread finished
[2024-12-30 11:41:11.000596397] TinyJVM finished.

Here is the section of the console log showing the inconsistency:

[2024-12-30 11:41:11.000595754] [Thread 2] Set-shared counter = 2
[2024-12-30 11:41:11.000595946] [Thread 2] Found shared variable counter
[2024-12-30 11:41:11.000595949] [Thread 2] PRINT counter = 1

The Thread-2 sets the counter to be 2, but when it tries to print it out, it gets 1, which we would only want to be returned in the Thread-1.

The complete source code for this article is available in the tiny-vm_02_global_scope directory of the TinyVM repository.

Next steps

We will need to add a proper synchronization using mutexes when reading/writing shared variables if we want thread safety (similar to using synchronized in Java).

Introduction
Part 1 — Foundations
Part 2 — Multithreading
Part 3 — Heap (you are here)
Part 4 — Synchronized
Part 5 — Refactoring
Part 6 — Functions
Part 7 — Compilation
Part 8 — Byte-code execution
Part 9 — Function call stack (not started)
Part 10 — Garbage collector (not started)