Building a Virtual Machine, JVM-inspired — Refactoring (Part 5)
Introduction
As our JVM-inspired virtual machine implementation has grown to include threads, memory management, and synchronization primitives, we’ve reached a point where refactoring has become essential. This article explains why modular design is crucial for virtual machines and how we’re restructuring our code.
Why refactor?
As our virtual machine implementation has grown to include threads, memory management, and synchronization primitives, several challenges have emerged:
- Our files have become too large and intertwined, making it difficult to understand how different components interact.
- Making changes has become increasingly risky as modifications in one area can have unexpected effects elsewhere.
Learning from the JVM
The JVM’s architecture is divided into several key subsystems:
- Class loader — Handles loading, linking, and initialization of classes
- Runtime data areas (memory area): Manages method area, heap, stack, PC registers, and native method stacks
- Execution engine: Interprets and executes the byte-code, JIT compiler, and garbage collector
While our implementation is simpler, we can learn from this separation of concerns.
Implementation goals
Module organization
We’re reorganizing the code into logical modules, each with a clear responsibility:
core/
— Core VM initialization and lifecycle managementinstruction/
— Instruction parsing and executionmemory/
— Memory management for both heap and local variablessynchronization/
— Thread synchronization primitivesthread/
— Thread management and executionutils/
— Shared utilities like logging
This separation of concerns makes the codebase more maintainable and easier to understand.
We will create the following structure:
tiny_vm_05_refactoring/
├── CMakeLists.txt
└── src/
├── main.c
├── types.h
├── core/
│ ├── vm.c
│ └── vm.h
├── instruction/
│ ├── instruction.c
│ └── instruction.h
├── memory/
│ ├── memory.c
│ └── memory.h
├── synchronization/
│ ├── synchronization.c
│ └── synchronization.h
├── thread/
│ ├── thread.c
│ └── thread.h
└── utils/
├── logger.c
└── logger.h
Build system
We are also going to move away from using Makefile
and we will use CMakeLists.txt
to make our build more declarative.
We’re moving from Make to CMake for several reasons:
- More declarative build configuration
- Better IDE integration (especially with CLion)
- Improved dependency management
- Cross-platform compatibility
- Easier configuration of build variants (debug/release)
The refactoring
We will create a new folder (aka project)
tiny_vm_05_refactoring
where we refactor our code, then we will reuse this in the next iterations.
We can use https://www.jetbrains.com/clion as editor (it makes working and debugging easier).
CMakeLists
Let’s simplify the way we build our code first.
# CMakeLists.txt
cmake_minimum_required(VERSION 3.30)
project(tiny_vm_05_refactoring C)
set(CMAKE_C_STANDARD 11)
add_executable(
tiny_vm_05_refactoring
src/main.c
src/utils/logger.c
src/utils/logger.h
src/core/vm.c
src/core/vm.h
src/thread/thread.c
src/thread/thread.h
src/synchronization/synchronization.c
src/synchronization/synchronization.h
src/memory/memory.c
src/memory/memory.h
src/instruction/instruction.c
src/instruction/instruction.h
src/types.h
)
We are going to create multiple exectutables in the future articles.
Types
We will put the types that are used in many of the modules to types.h
to prevent circular dependency imports.
// src/types.h
#ifndef TINY_VM_TYPES_H
#define TINY_VM_TYPES_H
#include <pthread.h>
typedef int32_t t_int;
// Variable storage
typedef struct Variable {
char* name;
t_int value;
} Variable;
// Execution frame (stack frame)
typedef struct LocalScope {
Variable* variables;
int var_count;
int var_capacity;
} LocalScope;
// Thread context
typedef struct ThreadContext {
LocalScope* local_scope;
const char** program;
int pc;
pthread_t thread;
int thread_id;
int is_running;
struct VM* vm;
char* function_name;
} ThreadContext;
// Synchronization
typedef struct SynchronizationLock {
char* name;
pthread_mutex_t mutex;
int locked; // For debugging
} SynchronizationLock;
// VM state
typedef struct VM {
// Thread management
ThreadContext* threads;
int thread_count;
int thread_capacity;
pthread_mutex_t thread_mgmt_lock;
int next_thread_id;
// Heap memory for shared variables
Variable* heap;
int heap_size;
int heap_capacity;
pthread_mutex_t heap_mgmt_lock;
// Mutex management
SynchronizationLock* locks;
int lock_count;
int lock_capacity;
pthread_mutex_t lock_mgmt_lock;
} VM;
#endif
The main
Here is a simplified main.c
file that serves as an entry point.
// src/main.c
#include "utils/logger.h"
#include "core/vm.h"
#include <stdio.h>
int main(void) {
const char* program[] = {
"set a 10",
"print a",
NULL
};
print("Starting TinyVM...");
VM* vm = create_vm();
start_vm(vm, program);
destroy_vm(vm);
print("TinyVM finished.");
return 0;
}
Logging (print with flush, & etc)
We will define a reusable “logger” that will provide us with a print
function that adds a timestamp
, a new line
, and flush
so we see the output immediatelly.
// src/utils/logger.h
#ifndef TINY_VM_LOGGER_H
#define TINY_VM_LOGGER_H
// Logging utilities
void print(const char *format, ...);
#endif
// src/utils/logger.c
#include "logger.h"
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <unistd.h>
#include <time.h>
void print(const char *format, ...) {
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
time_t t = ts.tv_sec;
struct tm tm;
localtime_r(&t, &tm);
printf("[%04d-%02d-%02d %02d:%02d:%02d.%06ld] ",
tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
tm.tm_hour, tm.tm_min, tm.tm_sec, ts.tv_nsec / 1000);
va_list args;
va_start(args, format);
vprintf(format, args);
va_end(args);
printf("\n");
fflush(stdout);
}
This
Virtual machine
We will put the VM code into core
folder. We will be updating these in the future arcticles.
// src/core/vm.h
#ifndef TINY_VM_CORE_VM_H
#define TINY_VM_CORE_VM_H
#include "../types.h"
// Core VM functions
VM* create_vm(void);
void start_vm(VM* vm, const char** program);
void destroy_vm(VM* vm);
#endif
// src/core/vm.c
#include "vm.h"
#include "../thread/thread.h"
#include "../memory/memory.h"
#include <stdlib.h>
VM* create_vm() {
VM* vm = malloc(sizeof(VM));
// Thread management
vm->thread_capacity = 10;
vm->thread_count = 0;
vm->threads = malloc(sizeof(ThreadContext) * vm->thread_capacity);
vm->next_thread_id = 0;
pthread_mutex_init(&vm->thread_mgmt_lock, NULL);
// Initialize heap
vm->heap_capacity = 10;
vm->heap_size = 0;
vm->heap = malloc(sizeof(Variable) * vm->heap_capacity);
pthread_mutex_init(&vm->heap_mgmt_lock, NULL);
// Initialize synchronization/mutex management
vm->lock_capacity = 10;
vm->lock_count = 0;
vm->locks = malloc(sizeof(SynchronizationLock) * vm->lock_capacity);
pthread_mutex_init(&vm->lock_mgmt_lock, NULL);
return vm;
}
void start_vm(VM* vm, const char** program) {
// Create main thread starting at line 0
create_thread(vm, program, 0);
// Wait for all threads to finish
for (int i = 0; i < vm->thread_count; i++) {
pthread_join(vm->threads[i].thread, NULL);
}
}
void destroy_vm(VM* vm) {
for (int i = 0; i < vm->thread_count; i++) {
//if (vm->threads[i].local_scope) {
destroy_local_scope(vm->threads[i].local_scope);
//}
}
// Cleanup heap
for (int i = 0; i < vm->heap_size; i++) {
free(vm->heap[i].name);
}
free(vm->heap);
pthread_mutex_destroy(&vm->heap_mgmt_lock);
// Cleanup mutexes
for (int i = 0; i < vm->lock_count; i++) {
pthread_mutex_destroy(&vm->locks[i].mutex);
free(vm->locks[i].name);
}
pthread_mutex_destroy(&vm->lock_mgmt_lock);
free(vm->locks);
free(vm->threads);
pthread_mutex_destroy(&vm->thread_mgmt_lock);
free(vm);
}
Instruction parsing and execution
The next logical segment of the VM is code (instruction) parsing and execution. We will be updating these in the future arcticles.
// src/instruction/instruction.h
#ifndef TINY_VM_INSTRUCTION_H
#define TINY_VM_INSTRUCTION_H
#include "../types.h"
typedef enum {
PRINT, // print <variable>
SET, // set <variable> <value>
ADD, // add <target> <var1> <var2>
SLEEP, // sleep <milliseconds>
THREAD, // thread <start_line>
EXIT, // exit
SETSHARED, // setshared <variable> <value>
LOCK, // lock <mutex_name>
UNLOCK // unlock <mutex_name>
} InstructionType;
typedef struct {
InstructionType type;
char args[3][32];
} Instruction;
Instruction parse_instruction(const char* line);
void execute_instruction(ThreadContext* thread, Instruction* instr);
#endif
// src/instruction/instruction.c
#include "instruction.h"
#include "../thread/thread.h"
#include "../memory/memory.h"
#include "../synchronization/synchronization.h"
#include "../utils/logger.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
Instruction parse_instruction(const char* line) {
Instruction instr;
memset(&instr, 0, sizeof(Instruction));
char cmd[32];
sscanf(line, "%s", cmd);
if (strcmp(cmd, "print") == 0) {
instr.type = PRINT;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "set") == 0) {
instr.type = SET;
sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
}
else if (strcmp(cmd, "add") == 0) {
instr.type = ADD;
sscanf(line, "%s %s %s %s", cmd, instr.args[0], instr.args[1], instr.args[2]);
}
else if (strcmp(cmd, "sleep") == 0) {
instr.type = SLEEP;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "thread") == 0) {
instr.type = THREAD;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "exit") == 0) {
instr.type = EXIT;
}
else if (strcmp(cmd, "setshared") == 0) {
instr.type = SETSHARED;
sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
}
else if (strcmp(cmd, "lock") == 0) {
instr.type = LOCK;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "unlock") == 0) {
instr.type = UNLOCK;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
return instr;
}
void execute_instruction(ThreadContext* thread, Instruction* instr) {
switch (instr->type) {
case PRINT: {
const t_int value = get_value(thread, instr->args[0]);
print("[Thread %d] Variable %s = %d", thread->thread_id, instr->args[0], value);
break;
}
case SET: {
Variable* var = get_variable(thread, instr->args[0]);
if (var) {
var->value = atoi(instr->args[1]);
}
break;
}
case ADD: {
t_int val1 = get_value(thread, instr->args[1]);
t_int val2 = get_value(thread, instr->args[2]);
Variable* target = get_variable(thread, instr->args[0]);
if (target) {
target->value = val1 + val2;
}
break;
}
case SLEEP: {
usleep(atoi(instr->args[0]) * 1000);
break;
}
case THREAD: {
const int start_line = atoi(instr->args[0]);
create_thread(thread->vm, thread->program, start_line);
break;
}
case EXIT: {
thread->is_running = 0;
break;
}
case SETSHARED: {
Variable* var = get_shared_variable(thread, instr->args[0]);
if (var) {
var->value = atoi(instr->args[1]);
print("[Thread %d] Set-shared %s = %d", thread->thread_id, var->name, var->value);
}
break;
}
case LOCK: {
SynchronizationLock* mutex = get_sync_lock(thread->vm, instr->args[0]);
if (mutex) {
print("[Thread %d] Waiting for lock '%s' at address %p",
thread->thread_id,
mutex->name,
(void*)&mutex->mutex
);
pthread_mutex_lock(&mutex->mutex);
mutex->locked = 1;
print("[Thread %d] Acquired lock '%s'", thread->thread_id, mutex->name);
}
break;
}
case UNLOCK: {
SynchronizationLock* mutex = get_sync_lock(thread->vm, instr->args[0]);
if (mutex && mutex->locked) {
pthread_mutex_unlock(&mutex->mutex);
mutex->locked = 0;
print("[Thread %d] Released lock '%s' at address %p",
thread->thread_id,
mutex->name,
(void*)&mutex->mutex
);
}
break;
}
}
}
Memory
The local scope and heap memory models go into memory
folder.
// src/memory/memory.h
#ifndef TINY_VM_MEMORY_H
#define TINY_VM_MEMORY_H
#include "../types.h"
LocalScope* create_local_scope(void);
void destroy_local_scope(LocalScope* local_scope);
t_int get_value(ThreadContext* thread, const char* name);
Variable* get_variable(ThreadContext* thread, const char* name);
Variable* get_shared_variable(ThreadContext* thread, const char* name);
#endif
// src/memory/memory.c
#include "memory.h"
#include "../thread/thread.h"
#include "../utils/logger.h"
#include <stdlib.h>
#include <string.h>
LocalScope* create_local_scope() {
LocalScope* local_scope = malloc(sizeof(LocalScope));
local_scope->var_capacity = 10;
local_scope->var_count = 0;
local_scope->variables = malloc(sizeof(Variable) * local_scope->var_capacity);
return local_scope;
}
void destroy_local_scope(LocalScope* local_scope) {
for (int i = 0; i < local_scope->var_count; i++) {
free(local_scope->variables[i].name);
}
free(local_scope->variables);
free(local_scope);
}
Variable* get_variable(ThreadContext* thread, const char* name) {
LocalScope* local_scope = thread->local_scope;
for (int i = 0; i < local_scope->var_count; i++) {
if (strcmp(local_scope->variables[i].name, name) == 0) {
return &local_scope->variables[i];
}
}
if (local_scope->var_count < local_scope->var_capacity) {
Variable* var = &local_scope->variables[local_scope->var_count++];
var->name = strdup(name);
var->value = 0;
return var;
}
return NULL;
}
// Get a shared variable from heap
Variable* get_shared_variable(ThreadContext* thread, const char* name) {
// Look for existing variable
for (int i = 0; i < thread->vm->heap_size; i++) {
if (strcmp(thread->vm->heap[i].name, name) == 0) {
print("[Thread %d] Found shared variable %s", thread->thread_id, name);
return &thread->vm->heap[i];
}
}
// Create a new variable if not found
pthread_mutex_lock(&thread->vm->heap_mgmt_lock);
if (thread->vm->heap_size < thread->vm->heap_capacity) {
Variable* var = &thread->vm->heap[thread->vm->heap_size++];
var->name = strdup(name);
var->value = 0; // Initialize as int by default
pthread_mutex_unlock(&thread->vm->heap_mgmt_lock);
print("[Thread %d] Created shared variable %s", thread->thread_id, name);
return var;
}
return NULL;
}
t_int get_value(ThreadContext* thread, const char* name) {
for (int i = 0; i < thread->local_scope->var_count; i++) {
if (strcmp(thread->local_scope->variables[i].name, name) == 0) {
return thread->local_scope->variables[i].value;
}
}
Variable* shared = get_shared_variable(thread, name);
if (shared) {
return shared->value;
}
return 0;
}
Thread management
Now we extract the thread management into its own files, we will be updating these in the future arcticles.
// src/thread/thread.h
#ifndef TINY_VM_THREAD_H
#define TINY_VM_THREAD_H
#include "../types.h"
ThreadContext* create_thread(VM* vm, const char** program, int start_line);
void* execute_thread_instructions(void* arg);
#endif
// src/thread/thread.c
#include "thread.h"
#include "../core/vm.h"
#include "../instruction/instruction.h"
#include "../utils/logger.h"
#include "../memory/memory.h"
ThreadContext* create_thread(VM* vm, const char** program, int start_line) {
pthread_mutex_lock(&vm->thread_mgmt_lock);
if (vm->thread_count >= vm->thread_capacity) {
pthread_mutex_unlock(&vm->thread_mgmt_lock);
return NULL;
}
ThreadContext* thread = &vm->threads[vm->thread_count++];
thread->local_scope = create_local_scope();
thread->program = program;
thread->pc = start_line;
thread->is_running = 1;
thread->thread_id = vm->next_thread_id++; // Assign and increment thread ID
thread->vm = vm;
pthread_create(&thread->thread, NULL, execute_thread_instructions, thread);
pthread_mutex_unlock(&vm->thread_mgmt_lock);
return thread;
}
void* execute_thread_instructions(void* arg) {
ThreadContext* thread = (ThreadContext*) arg;
print("[Thread %d] Thread instructions started", thread->thread_id);
while (thread->is_running) {
const char* line = thread->program[thread->pc];
if (line == NULL) break;
Instruction instr = parse_instruction(line);
execute_instruction(thread, &instr);
thread->pc++;
}
print("[Thread %d] Thread instructions finished", thread->thread_id);
return NULL;
}
Thread synchronization
The last logical segment is the synchronization. We will put it into its own folder as well.
// src/synchronization/synchronization.h
#ifndef TINY_VM_SYNCHRONIZATION_H
#define TINY_VM_SYNCHRONIZATION_H
#include "../types.h"
SynchronizationLock* get_sync_lock(VM* vm, const char* name);
#endif
// src/synchronization/synchronization.c
#include "synchronization.h"
#include <stdlib.h>
#include <string.h>
// Get or create mutex
SynchronizationLock* get_sync_lock(VM* vm, const char* name) {
// Look for existing mutex
for (int i = 0; i < vm->lock_count; i++) {
if (strcmp(vm->locks[i].name, name) == 0) {
return &vm->locks[i];
}
}
// Create new mutex if not found
if (vm->lock_count < vm->lock_capacity) {
pthread_mutex_lock(&vm->lock_mgmt_lock);
SynchronizationLock* mutex = &vm->locks[vm->lock_count++];
mutex->name = strdup(name);
pthread_mutex_init(&mutex->mutex, NULL);
mutex->locked = 0;
pthread_mutex_unlock(&vm->lock_mgmt_lock);
return mutex;
}
return NULL;
}
The complete source code for this article is available in the tiny-vm_05_refactoring directory of the TinyVM repository.
The next steps
We are ready to add functions to our code, so we can reuse and execute code both synchronously (in the same thread where the function is called) and asynchronously (in a new thread).
- Introduction
- Part 1 — Foundations
- Part 2 — Multithreading
- Part 3 — Heap
- Part 4 — Synchronized
- Part 5 — Refactoring (you are here)
- Part 6 — Functions
- Part 7 — Compilation
- Part 8 — Byte-code execution
- Part 9 — Function call stack (not started)
- Part 10 — Garbage collector (not started)