Building a Virtual Machine, JVM-inspired — Refactoring (Part 5)

9 min readJan 9, 2025

Introduction

As our JVM-inspired virtual machine implementation has grown to include threads, memory management, and synchronization primitives, we’ve reached a point where refactoring has become essential. This article explains why modular design is crucial for virtual machines and how we’re restructuring our code.

Why refactor?

As our virtual machine implementation has grown to include threads, memory management, and synchronization primitives, several challenges have emerged:

Our files have become too large and intertwined, making it difficult to understand how different components interact.
Making changes has become increasingly risky as modifications in one area can have unexpected effects elsewhere.

Learning from the JVM

The JVM’s architecture is divided into several key subsystems:

Class loader — Handles loading, linking, and initialization of classes
Runtime data areas (memory area): Manages method area, heap, stack, PC registers, and native method stacks
Execution engine: Interprets and executes the byte-code, JIT compiler, and garbage collector

While our implementation is simpler, we can learn from this separation of concerns.

Implementation goals

Module organization

We’re reorganizing the code into logical modules, each with a clear responsibility:

core/ — Core VM initialization and lifecycle management
instruction/ — Instruction parsing and execution
memory/ — Memory management for both heap and local variables
synchronization/ — Thread synchronization primitives
thread/ — Thread management and execution
utils/ — Shared utilities like logging

This separation of concerns makes the codebase more maintainable and easier to understand.

We will create the following structure:

tiny_vm_05_refactoring/
├── CMakeLists.txt
└── src/
    ├── main.c
    ├── types.h
    ├── core/
    │   ├── vm.c
    │   └── vm.h
    ├── instruction/
    │   ├── instruction.c
    │   └── instruction.h
    ├── memory/
    │   ├── memory.c
    │   └── memory.h
    ├── synchronization/
    │   ├── synchronization.c
    │   └── synchronization.h
    ├── thread/
    │   ├── thread.c
    │   └── thread.h
    └── utils/
        ├── logger.c
        └── logger.h

Build system

We are also going to move away from using Makefile and we will use CMakeLists.txt to make our build more declarative.

We’re moving from Make to CMake for several reasons:

More declarative build configuration
Better IDE integration (especially with CLion)
Improved dependency management
Cross-platform compatibility
Easier configuration of build variants (debug/release)

The refactoring

We will create a new folder (aka project) tiny_vm_05_refactoring where we refactor our code, then we will reuse this in the next iterations.

We can use https://www.jetbrains.com/clion as editor (it makes working and debugging easier).

CMakeLists

Let’s simplify the way we build our code first.

# CMakeLists.txt
cmake_minimum_required(VERSION 3.30)
project(tiny_vm_05_refactoring C)

set(CMAKE_C_STANDARD 11)

add_executable(
        tiny_vm_05_refactoring
        src/main.c
        src/utils/logger.c
        src/utils/logger.h
        src/core/vm.c
        src/core/vm.h
        src/thread/thread.c
        src/thread/thread.h
        src/synchronization/synchronization.c
        src/synchronization/synchronization.h
        src/memory/memory.c
        src/memory/memory.h
        src/instruction/instruction.c
        src/instruction/instruction.h
        src/types.h
)

We are going to create multiple exectutables in the future articles.

Types

We will put the types that are used in many of the modules to types.h to prevent circular dependency imports.

// src/types.h
#ifndef TINY_VM_TYPES_H
#define TINY_VM_TYPES_H

#include <pthread.h>

typedef int32_t t_int;

// Variable storage
typedef struct Variable {
    char* name;
    t_int value;
} Variable;

// Execution frame (stack frame)
typedef struct LocalScope {
    Variable* variables;
    int var_count;
    int var_capacity;
} LocalScope;

// Thread context
typedef struct ThreadContext {
    LocalScope* local_scope;
    const char** program;
    int pc;
    pthread_t thread;
    int thread_id;
    int is_running;
    struct VM* vm;
    char* function_name;
} ThreadContext;

// Synchronization
typedef struct SynchronizationLock {
    char* name;
    pthread_mutex_t mutex;
    int locked;  // For debugging
} SynchronizationLock;

// VM state
typedef struct VM {
    // Thread management
    ThreadContext* threads;
    int thread_count;
    int thread_capacity;
    pthread_mutex_t thread_mgmt_lock;
    int next_thread_id;

    // Heap memory for shared variables
    Variable* heap;
    int heap_size;
    int heap_capacity;
    pthread_mutex_t heap_mgmt_lock;

    // Mutex management
    SynchronizationLock* locks;
    int lock_count;
    int lock_capacity;
    pthread_mutex_t lock_mgmt_lock;
} VM;

#endif

The main

Here is a simplified main.c file that serves as an entry point.

// src/main.c
#include "utils/logger.h"
#include "core/vm.h"

#include <stdio.h>

int main(void) {
    const char* program[] = {
        "set a 10",
        "print a",
        NULL
     };

    print("Starting TinyVM...");

    VM* vm = create_vm();
    start_vm(vm, program);
    destroy_vm(vm);

    print("TinyVM finished.");
    return 0;
}

Logging (print with flush, & etc)

We will define a reusable “logger” that will provide us with a print function that adds a timestamp, a new line, and flush so we see the output immediatelly.

// src/utils/logger.h
#ifndef TINY_VM_LOGGER_H
#define TINY_VM_LOGGER_H

// Logging utilities
void print(const char *format, ...);

#endif

// src/utils/logger.c
#include "logger.h"

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <unistd.h>
#include <time.h>

void print(const char *format, ...) {
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    time_t t = ts.tv_sec;
    struct tm tm;
    localtime_r(&t, &tm);

    printf("[%04d-%02d-%02d %02d:%02d:%02d.%06ld] ",
        tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
        tm.tm_hour, tm.tm_min, tm.tm_sec, ts.tv_nsec / 1000);

    va_list args;
    va_start(args, format);
    vprintf(format, args);
    va_end(args);
    printf("\n");
    fflush(stdout);
}

This print function is only for debugging purposes.

Virtual machine

We will put the VM code into core folder. We will be updating these in the future arcticles.

// src/core/vm.h
#ifndef TINY_VM_CORE_VM_H
#define TINY_VM_CORE_VM_H

#include "../types.h"

// Core VM functions
VM* create_vm(void);
void start_vm(VM* vm, const char** program);
void destroy_vm(VM* vm);

#endif

// src/core/vm.c
#include "vm.h"
#include "../thread/thread.h"
#include "../memory/memory.h"

#include <stdlib.h>

VM* create_vm() {
    VM* vm = malloc(sizeof(VM));

    // Thread management
    vm->thread_capacity = 10;
    vm->thread_count = 0;
    vm->threads = malloc(sizeof(ThreadContext) * vm->thread_capacity);
    vm->next_thread_id = 0;
    pthread_mutex_init(&vm->thread_mgmt_lock, NULL);

    // Initialize heap
    vm->heap_capacity = 10;
    vm->heap_size = 0;
    vm->heap = malloc(sizeof(Variable) * vm->heap_capacity);
    pthread_mutex_init(&vm->heap_mgmt_lock, NULL);

    // Initialize synchronization/mutex management
    vm->lock_capacity = 10;
    vm->lock_count = 0;
    vm->locks = malloc(sizeof(SynchronizationLock) * vm->lock_capacity);
    pthread_mutex_init(&vm->lock_mgmt_lock, NULL);

    return vm;
}

void start_vm(VM* vm, const char** program) {
    // Create main thread starting at line 0
    create_thread(vm, program, 0);

    // Wait for all threads to finish
    for (int i = 0; i < vm->thread_count; i++) {
        pthread_join(vm->threads[i].thread, NULL);
    }
}

void destroy_vm(VM* vm) {
    for (int i = 0; i < vm->thread_count; i++) {
        //if (vm->threads[i].local_scope) {
            destroy_local_scope(vm->threads[i].local_scope);
        //}
    }

    // Cleanup heap
    for (int i = 0; i < vm->heap_size; i++) {
        free(vm->heap[i].name);
    }
    free(vm->heap);
    pthread_mutex_destroy(&vm->heap_mgmt_lock);

    // Cleanup mutexes
    for (int i = 0; i < vm->lock_count; i++) {
        pthread_mutex_destroy(&vm->locks[i].mutex);
        free(vm->locks[i].name);
    }
    pthread_mutex_destroy(&vm->lock_mgmt_lock);
    free(vm->locks);

    free(vm->threads);
    pthread_mutex_destroy(&vm->thread_mgmt_lock);
    free(vm);
}

Instruction parsing and execution

The next logical segment of the VM is code (instruction) parsing and execution. We will be updating these in the future arcticles.

// src/instruction/instruction.h
#ifndef TINY_VM_INSTRUCTION_H
#define TINY_VM_INSTRUCTION_H

#include "../types.h"

typedef enum {
    PRINT,    // print <variable>
    SET,      // set <variable> <value>
    ADD,      // add <target> <var1> <var2>
    SLEEP,    // sleep <milliseconds>
    THREAD,   // thread <start_line>
    EXIT,      // exit
    SETSHARED, // setshared <variable> <value>
    LOCK,      // lock <mutex_name>
    UNLOCK     // unlock <mutex_name>
} InstructionType;

typedef struct {
    InstructionType type;
    char args[3][32];
} Instruction;

Instruction parse_instruction(const char* line);

void execute_instruction(ThreadContext* thread, Instruction* instr);

#endif

// src/instruction/instruction.c
#include "instruction.h"

#include "../thread/thread.h"
#include "../memory/memory.h"
#include "../synchronization/synchronization.h"
#include "../utils/logger.h"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

Instruction parse_instruction(const char* line) {
    Instruction instr;
    memset(&instr, 0, sizeof(Instruction));

    char cmd[32];
    sscanf(line, "%s", cmd);

    if (strcmp(cmd, "print") == 0) {
        instr.type = PRINT;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }
    else if (strcmp(cmd, "set") == 0) {
        instr.type = SET;
        sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
    }
    else if (strcmp(cmd, "add") == 0) {
        instr.type = ADD;
        sscanf(line, "%s %s %s %s", cmd, instr.args[0], instr.args[1], instr.args[2]);
    }
    else if (strcmp(cmd, "sleep") == 0) {
        instr.type = SLEEP;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }
    else if (strcmp(cmd, "thread") == 0) {
        instr.type = THREAD;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }
    else if (strcmp(cmd, "exit") == 0) {
        instr.type = EXIT;
    }
    else if (strcmp(cmd, "setshared") == 0) {
        instr.type = SETSHARED;
        sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
    }
    else if (strcmp(cmd, "lock") == 0) {
        instr.type = LOCK;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }
    else if (strcmp(cmd, "unlock") == 0) {
        instr.type = UNLOCK;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }

    return instr;
}

void execute_instruction(ThreadContext* thread, Instruction* instr) {
    switch (instr->type) {
        case PRINT: {
            const t_int value = get_value(thread, instr->args[0]);
            print("[Thread %d] Variable %s = %d", thread->thread_id, instr->args[0], value);
            break;
        }
        case SET: {
            Variable* var = get_variable(thread, instr->args[0]);
            if (var) {
                var->value = atoi(instr->args[1]);
            }
            break;
        }
        case ADD: {
            t_int val1 = get_value(thread, instr->args[1]);
            t_int val2 = get_value(thread, instr->args[2]);
            Variable* target = get_variable(thread, instr->args[0]);
            if (target) {
                target->value = val1 + val2;
            }
            break;
        }
        case SLEEP: {
            usleep(atoi(instr->args[0]) * 1000);
            break;
        }
        case THREAD: {
            const int start_line = atoi(instr->args[0]);
            create_thread(thread->vm, thread->program, start_line);
            break;
        }
        case EXIT: {
            thread->is_running = 0;
            break;
        }
        case SETSHARED: {
            Variable* var = get_shared_variable(thread, instr->args[0]);
            if (var) {
                var->value = atoi(instr->args[1]);
                print("[Thread %d] Set-shared %s = %d", thread->thread_id, var->name, var->value);
            }
            break;
        }
        case LOCK: {
            SynchronizationLock* mutex = get_sync_lock(thread->vm, instr->args[0]);
            if (mutex) {
                print("[Thread %d] Waiting for lock '%s' at address %p",
                    thread->thread_id,
                    mutex->name,
                    (void*)&mutex->mutex
                );
                pthread_mutex_lock(&mutex->mutex);
                mutex->locked = 1;
                print("[Thread %d] Acquired lock '%s'", thread->thread_id, mutex->name);
            }
            break;
        }
        case UNLOCK: {
            SynchronizationLock* mutex = get_sync_lock(thread->vm, instr->args[0]);
            if (mutex && mutex->locked) {
                pthread_mutex_unlock(&mutex->mutex);
                mutex->locked = 0;
                print("[Thread %d] Released lock '%s' at address %p",
                    thread->thread_id,
                    mutex->name,
                    (void*)&mutex->mutex
                );
            }
            break;
        }
    }
}

Memory

The local scope and heap memory models go into memory folder.

// src/memory/memory.h
#ifndef TINY_VM_MEMORY_H
#define TINY_VM_MEMORY_H

#include "../types.h"

LocalScope* create_local_scope(void);
void destroy_local_scope(LocalScope* local_scope);

t_int get_value(ThreadContext* thread, const char* name);

Variable* get_variable(ThreadContext* thread, const char* name);
Variable* get_shared_variable(ThreadContext* thread, const char* name);

#endif

// src/memory/memory.c
#include "memory.h"
#include "../thread/thread.h"
#include "../utils/logger.h"

#include <stdlib.h>
#include <string.h>

LocalScope* create_local_scope() {
    LocalScope* local_scope = malloc(sizeof(LocalScope));
    local_scope->var_capacity = 10;
    local_scope->var_count = 0;
    local_scope->variables = malloc(sizeof(Variable) * local_scope->var_capacity);
    return local_scope;
}

void destroy_local_scope(LocalScope* local_scope) {
    for (int i = 0; i < local_scope->var_count; i++) {
        free(local_scope->variables[i].name);
    }
    free(local_scope->variables);
    free(local_scope);
}

Variable* get_variable(ThreadContext* thread, const char* name) {
    LocalScope* local_scope = thread->local_scope;
    for (int i = 0; i < local_scope->var_count; i++) {
        if (strcmp(local_scope->variables[i].name, name) == 0) {
            return &local_scope->variables[i];
        }
    }

    if (local_scope->var_count < local_scope->var_capacity) {
        Variable* var = &local_scope->variables[local_scope->var_count++];
        var->name = strdup(name);
        var->value = 0;
        return var;
    }
    return NULL;
}

// Get a shared variable from heap
Variable* get_shared_variable(ThreadContext* thread, const char* name) {

    // Look for existing variable
    for (int i = 0; i < thread->vm->heap_size; i++) {
        if (strcmp(thread->vm->heap[i].name, name) == 0) {
            print("[Thread %d] Found shared variable %s", thread->thread_id, name);
            return &thread->vm->heap[i];
        }
    }

    // Create a new variable if not found
    pthread_mutex_lock(&thread->vm->heap_mgmt_lock);
    if (thread->vm->heap_size < thread->vm->heap_capacity) {
        Variable* var = &thread->vm->heap[thread->vm->heap_size++];
        var->name = strdup(name);
        var->value = 0;  // Initialize as int by default
        pthread_mutex_unlock(&thread->vm->heap_mgmt_lock);
        print("[Thread %d] Created shared variable %s", thread->thread_id, name);
        return var;
    }

    return NULL;
}

t_int get_value(ThreadContext* thread, const char* name) {
    for (int i = 0; i < thread->local_scope->var_count; i++) {
        if (strcmp(thread->local_scope->variables[i].name, name) == 0) {
            return thread->local_scope->variables[i].value;
        }
    }
    Variable* shared = get_shared_variable(thread, name);
    if (shared) {
        return shared->value;
    }
    return 0;
}

Thread management

Now we extract the thread management into its own files, we will be updating these in the future arcticles.

// src/thread/thread.h
#ifndef TINY_VM_THREAD_H
#define TINY_VM_THREAD_H

#include "../types.h"

ThreadContext* create_thread(VM* vm, const char** program, int start_line);

void* execute_thread_instructions(void* arg);

#endif

// src/thread/thread.c
#include "thread.h"
#include "../core/vm.h"
#include "../instruction/instruction.h"
#include "../utils/logger.h"
#include "../memory/memory.h"

ThreadContext* create_thread(VM* vm, const char** program, int start_line) {
    pthread_mutex_lock(&vm->thread_mgmt_lock);

    if (vm->thread_count >= vm->thread_capacity) {
        pthread_mutex_unlock(&vm->thread_mgmt_lock);
        return NULL;
    }

    ThreadContext* thread = &vm->threads[vm->thread_count++];
    thread->local_scope = create_local_scope();
    thread->program = program;
    thread->pc = start_line;
    thread->is_running = 1;
    thread->thread_id = vm->next_thread_id++;  // Assign and increment thread ID
    thread->vm = vm;

    pthread_create(&thread->thread, NULL, execute_thread_instructions, thread);

    pthread_mutex_unlock(&vm->thread_mgmt_lock);
    return thread;
}

void* execute_thread_instructions(void* arg) {
    ThreadContext* thread = (ThreadContext*) arg;
    print("[Thread %d] Thread instructions started", thread->thread_id);

    while (thread->is_running) {
        const char* line = thread->program[thread->pc];
        if (line == NULL) break;

        Instruction instr = parse_instruction(line);
        execute_instruction(thread, &instr);

        thread->pc++;
    }

    print("[Thread %d] Thread instructions finished", thread->thread_id);
    return NULL;
}

Thread synchronization

The last logical segment is the synchronization. We will put it into its own folder as well.

// src/synchronization/synchronization.h
#ifndef TINY_VM_SYNCHRONIZATION_H
#define TINY_VM_SYNCHRONIZATION_H

#include "../types.h"

SynchronizationLock* get_sync_lock(VM* vm, const char* name);

#endif

// src/synchronization/synchronization.c
#include "synchronization.h"

#include <stdlib.h>
#include <string.h>

// Get or create mutex
SynchronizationLock* get_sync_lock(VM* vm, const char* name) {

    // Look for existing mutex
    for (int i = 0; i < vm->lock_count; i++) {
        if (strcmp(vm->locks[i].name, name) == 0) {
            return &vm->locks[i];
        }
    }

    // Create new mutex if not found
    if (vm->lock_count < vm->lock_capacity) {
        pthread_mutex_lock(&vm->lock_mgmt_lock);

        SynchronizationLock* mutex = &vm->locks[vm->lock_count++];
        mutex->name = strdup(name);
        pthread_mutex_init(&mutex->mutex, NULL);
        mutex->locked = 0;

        pthread_mutex_unlock(&vm->lock_mgmt_lock);
        return mutex;
    }
    return NULL;
}

The complete source code for this article is available in the tiny-vm_05_refactoring directory of the TinyVM repository.

The next steps

We are ready to add functions to our code, so we can reuse and execute code both synchronously (in the same thread where the function is called) and asynchronously (in a new thread).

Introduction
Part 1 — Foundations
Part 2 — Multithreading
Part 3 — Heap
Part 4 — Synchronized
Part 5 — Refactoring (you are here)
Part 6 — Functions
Part 7 — Compilation
Part 8 — Byte-code execution
Part 9 — Function call stack (not started)
Part 10 — Garbage collector (not started)