Building a Virtual Machine, JVM-inspired — Foundations (Part 1)

7 min readJan 3, 2025

This article covers setting up the fundamentals of our VM and implementing basic code execution capabilities in C.

Why C?

We’ll implement our tiny virtual machine (let’s call it TinyVM) in C, following the same approach as the JVM (which is implemented in C and C++). This choice offers several key advantages:

Complete control over memory allocation and deallocation
Direct access to system calls and CPU features for optimal performance
Universal availability of C compilers with platform-specific optimizations
Direct access to operating system features like thread management

Implementation goals

Our first milestone is to implement basic instruction parsing and execution with the following capabilities:

Variable definition in local storage
Variable printing
Basic arithmetic operations (addition)
Thread sleep functionality

Rather than implementing full Java language parsing, which would be unnecessarily complex for our demonstration purposes, we’ll work with a simplified instruction set, defined directly in the C code as an array of string commands.

const char* program[] = {
    "set x 5",
    "set y 3",
    "add z x y",
    "print z",
    "sleep 1000",  // sleep 1 second
    "set z 42",
    "print z",
    NULL
};

We will evolve this and move the code to its own text files, but we will do that once we get to the comilation and byte-code in the following articles.

Understanding JVM concepts through TinyVM

While our implementation is simplified, each component we build mirrors fundamental JVM concepts.

Variable Resolution

The get_variable function demonstrates a key concept in virtual machines — variable resolution and storage management.

typedef struct {
    Variable* variables;
    int var_count;
    int var_capacity;
} LocalScope;

// Find or create variable in local_scope
Variable* get_variable(LocalScope* local_scope, const char* name) {
    // First search for existing variable
    for (int i = 0; i < local_scope->var_count; i++) {
        if (strcmp(local_scope->variables[i].name, name) == 0) {
            return &local_scope->variables[i];
        }
    }
    
    // Create new variable if not found, and if space available
    if (local_scope->var_count < local_scope->var_capacity) {
        Variable* var = &local_scope->variables[local_scope->var_count++];
        var->name = strdup(name);
        var->value = 0;
        return var;
    }
    return NULL;
}

While our implementation uses a simple array with fixed capacity (10 variables), it introduces important concepts that the JVM handles in its own way:

Variable lookup — We do a linear search through our variables array. But the JVM uses frame-based lookup where each method call creates a new frame with its own local variable array. This design choice impacts performance: our linear search is O(n), while JVM’s frame-based lookup is O(1) since it uses index-based access.

Storage management — We use a fixed capacity of 10 variables for simplicity. The JVM dynamically manages method frames and teir local variables. When a method is called, the JVM knows exactly how many variables it needs (from the byte-code) and allocates them accordingly.

Memory organization — Our structure keeps all variables in one array. The JVM separates concerns by keeping local variables in their method’s frame and objects live on the heap. We will implement the heap later on.

This foundation will become important when we add threads later — each thread will need its own LocalScope, just like the JVM gives each thread its own stack for method frames and local variables.

The implementation

Type Definitions

First, let’s define our core types in tiny_vm.h. It contains our data types, structs to manage local scope and represent the code instructions.

// tiny_vm.h
#ifndef TINY_VM_H
#define TINY_VM_H

#include <stdint.h>

// Our VM's data types
typedef int32_t t_int;

typedef struct {
    char* name; // e.g. index
    t_int value; // e.g. 1
} Variable;

typedef struct {
    Variable* variables;
    int var_count;
    int var_capacity;
} LocalScope;

typedef enum {
    PRINT,    // print <variable>
    SET,      // set <variable> <value>
    ADD,      // add <target> <var1> <var2>
    SLEEP,    // sleep <milliseconds>
} InstructionType;

typedef struct {
    InstructionType type;
    char args[3][32];  // Max 3 arguments per instruction, each max 31 chars
} Instruction;

// Thread functions
LocalScope* create_local_scope(void);
void destroy_local_scope(LocalScope* local_scope);

// Instruction functions
Instruction parse_instruction(const char* line);
void execute_instruction(LocalScope* local_scope, Instruction* instr);

#endif

Type system

We prefix the actualy type name with t_ to make it clear that this type is TinyVM specific. We are going to be using t_int instead of raw int32_t for several reasons:

It guarantees 32-bit size across all platforms
It clearly indicates VM semantics rather than native C integers
It centralizes integer handling for future modifications (e.g., overflow checks)

// Bad: size could vary by platform
int some_number;      

// Good: guaranteed 32-bit on all platforms
typedef int32_t t_int;  
t_int some_number;

// we could define other types like this 
// (but we won't as having one type is enough for now)
typedef int64_t t_long;
typedef int16_t t_short;
typedef int8_t t_byte;

OpenJDK HotSpot, in globalDefinitions.hpp is reusing types defined in jni.h, feel free to explore the details yourself.

The implementation

This (tiny_vm.c) is the heart of our tiny virtual machine that manages local variables and executes basic instructions.

Local scope management:

create_local_scope()— Creates storage for up to 10 local variables
destroy_local_scope()— Cleans up all allocated memory, preventing leaks

Variable handling:

get_variable()— Finds or creates named variables in local scope
get_value()— Retrieves a variable's value, returns 0 if not found

Instruction processing:

parse_instruction()— Converts text commands like "set x 5" into structured instructions
execute_instruction()— Executes four basic operations. SET to assigns a value to a variable. ADD to sum two variables and store the result. PRINT to Display a variable’s value. SLEEP to pause execution for specified milliseconds.

Reminder — In C language, the dot operator (.) is used when you have an actual structure variable and the arrow operator (->) when we have a pointer to a structure.

// tiny_vm.c
#include <stdio.h> // for console output, printf() etc.
#include <stdlib.h> // memory management, malloc(), free(), atoi()
#include <string.h> // string manipulations, strcmp(), strdup(), memset()
#include <unistd.h> // POSIX (Portable Operating System Interface), usleep()
#include "tiny_vm.h"

LocalScope* create_local_scope() {
    LocalScope* local_scope = malloc(sizeof(LocalScope));
    local_scope->var_capacity = 10;
    local_scope->var_count = 0;
    local_scope->variables = malloc(sizeof(Variable) * local_scope->var_capacity);
    return local_scope;
}

void destroy_local_scope(LocalScope* local_scope) {
    for (int i = 0; i < local_scope->var_count; i++) {
        free(local_scope->variables[i].name);
    }
    free(local_scope->variables);
    free(local_scope);
}

// Find or create variable on local_scope
Variable* get_variable(LocalScope* local_scope, const char* name) {
    for (int i = 0; i < local_scope->var_count; i++) {
        if (strcmp(local_scope->variables[i].name, name) == 0) {
            return &local_scope->variables[i];
        }
    }

    // Create new variable
    if (local_scope->var_count < local_scope->var_capacity) {
        Variable* var = &local_scope->variables[local_scope->var_count++];
        var->name = strdup(name);
        var->value = 0;
        return var;
    }
    return NULL;
}

// Get variable value, return 0 if not found
jint get_value(LocalScope* local_scope, const char* name) {
    for (int i = 0; i < local_scope->var_count; i++) {
        if (strcmp(local_scope->variables[i].name, name) == 0) {
            return local_scope->variables[i].value;
        }
    }
    return 0;
}

Instruction parse_instruction(const char* line) {
    Instruction instr;
    memset(&instr, 0, sizeof(Instruction));

    char cmd[32];
    sscanf(line, "%s", cmd); // Read first word from line (command)

    if (strcmp(cmd, "print") == 0) {
        instr.type = PRINT;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }
    else if (strcmp(cmd, "set") == 0) {
        instr.type = SET;
        sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
    }
    else if (strcmp(cmd, "add") == 0) {
        instr.type = ADD;
        sscanf(line, "%s %s %s %s", cmd, instr.args[0], instr.args[1], instr.args[2]);
    }
    else if (strcmp(cmd, "sleep") == 0) {
        instr.type = SLEEP;
        sscanf(line, "%s %s", cmd, instr.args[0]);
    }

    return instr;
}

void execute_instruction(LocalScope* local_scope, Instruction* instr) {
    switch (instr->type) {
        case PRINT: {
            jint value = get_value(local_scope, instr->args[0]);
            printf("Printing: %s %d\n", instr->args[0], value);
            break;
        }
        case SET: {
            Variable* var = get_variable(local_scope, instr->args[0]);
            if (var) {
                var->value = atoi(instr->args[1]);
            }
            break;
        }
        case ADD: {
            jint val1 = get_value(local_scope, instr->args[1]);
            jint val2 = get_value(local_scope, instr->args[2]);
            Variable* target = get_variable(local_scope, instr->args[0]);
            if (target) {
                target->value = val1 + val2;
            }
            break;
        }
        case SLEEP: {
            usleep(atoi(instr->args[0]) * 1000);
            break;
        }
    }
}

Finally, let’s implement the main function that ties our VM components together.

// main.c
#include <stdio.h>
#include "tiny_vm.h"

int main() {
    // Initialize local scope to store the variables in our program
    LocalScope* local_scope = create_local_scope();

    // Our "Java" program (each line will be executed in sequence)
    const char* program[] = {
        "set x 5",
        "set y 3",
        "add z x y",
        "print z",
        "sleep 1000",  // sleep 1 second
        "set z 42",
        "print z",
        NULL
    };

    // Execute the program
    printf("Starting TinyVM...\n-----------------\n");

    for (int i = 0; program[i] != NULL; i++) {
        printf("Executing: %s\n", program[i]);
        Instruction instr = parse_instruction(program[i]);
        execute_instruction(local_scope, &instr);
    }

    // Cleanup
    destroy_local_scope(local_scope);
    printf("_________________\nTinyVM finished.\n");

    return 0;
}

Building and running

We will use this Makefile in the couple of following articles in this series, no changes are required (just place is in the root folder together with the other .c and .h files).

CC = gcc
CFLAGS = -Wall -pthread
LDFLAGS = -pthread

# Directory structure
BUILD_DIR = build
SRC_DIR = .

# Source files and their corresponding object files in build directory
SRCS = tiny_vm.c main.c
OBJS = $(SRCS:%.c=$(BUILD_DIR)/%.o)
TARGET = $(BUILD_DIR)/tiny_vm

.PHONY: all clean mkdir

all: mkdir $(TARGET)

# Create build directory if it doesn't exist
mkdir:
 @mkdir -p $(BUILD_DIR)

$(TARGET): $(OBJS)
 $(CC) $(OBJS) -o $(TARGET) $(LDFLAGS)

# Pattern rule for object files
$(BUILD_DIR)/%.o: $(SRC_DIR)/%.c tiny_vm.h
 $(CC) $(CFLAGS) -c $< -o $@

clean:
 rm -rf $(BUILD_DIR)

Now we can call make and run the our VM. The output shows both debugging/tracing information and actual program output, which is common during development of VMs and interpreters.

➜  tiny-vm_01 git:(main) ✗ make
gcc -Wall -pthread -c main.c -o build/main.o
gcc build/tiny_vm.o build/main.o -o build/tiny_vm -pthread

➜  tiny-vm_01 git:(main) ✗ ./build/tiny_vm
Starting TinyVM...
-----------------
Executing: set x 5
Executing: set y 3
Executing: add z x y
Executing: print z
Printing: z 8
Executing: sleep 1000
Executing: set z 42
Executing: print z
Printing: z 42
_________________
TinyVM finished.

We can clean up the output and show only program results once we’ve implemented the core features. Later, we’ll add proper debugging flags similar to the JVM’s -verbose options for when we need to trace execution flow.

The complete source code for this article is available in the tiny-vm/tiny-vm_01_foundations directory of the TinyVM repository.

Next steps

We are able to execute the code instructions, and we are ready to add the ability to create and start multiple threads and execute the code concurently.

Introduction
Part 1 — Foundations (you are here)
Part 2 — Multithreading
Part 3 — Heap
Part 4 — Synchronized
Part 5 — Refactoring
Part 6 — Functions
Part 7 — Compilation
Part 8 — Byte-code execution
Part 9 — Function call stack (not started)
Part 10 — Garbage collector (not started)

Building a Virtual Machine, JVM-inspired — Foundations (Part 1)

Why C?

Implementation goals

Understanding JVM concepts through TinyVM

Variable Resolution

The implementation

Type Definitions

Type system

The implementation

Building and running

Next steps

Written by Ondrej Kvasnovsky

No responses yet