Building a Virtual Machine, JVM-inspired — Functions (Part 6)
Introduction
In this part, we’ll implement two essential function execution modes in our VM: synchronous and asynchronous execution.
Synchronous execution runs a function in the same thread as its caller, waiting for completion before continuing — similar to how standard function calls work in most programming languages.
Asynchronous execution creates a new thread for the function, allowing it to run independently alongside other code — enabling concurrent operations like background tasks or parallel processing. This dual-mode approach gives developers flexibility in controlling program flow and managing concurrent operations.
Design decisions
Function management
Our VM will implement a simplified version of the JVM’s method area.
Here are the key components we’ll build:
- Each function will contain a name and code block.
- The VM will maintain a list of functions with thread-safety lock when creating the functions (so we might the loading concurrently or during runtime in future).
- Functions will be loaded during VM initialization rather than runtime.
Our VM vs JVM approach
This simplified design provides several benefits for our learning implementation, though it’s worth noting how it differs from the JVM.
Our VM’s approach:
- Loads all functions at startup
- Simpler to implement and understand
- Limited to predefined functions
- No dynamic loading overhead
JVM’s actual approach:
- Loads classes dynamically at runtime
- Uses ClassLoader hierarchy
- Supports dynamic class loading and unloading
- Performs verification during loading
Execution modes
The VM will support two execution modes:
Synchronous Execution (sync
):
- Will run in the caller’s thread
- Will maintain program counter and context
- Useful for sequential operations and maintaining execution order
Asynchronous Execution (async
):
- Will create a new thread for execution
- Will enable concurrent operations
- Suitable for parallel tasks and non-blocking operations
Implementation goals
We are going to implement the function support so we can:
- Define a
function
with a name and its own separate block of commands. - Call
sync
command that executes the function in the current thraed. - Call
async
command that executes a function in a new thread.
Here is how is it going to look like from the user point of view.
const char *function1[] = {
"function function1",
"set a 2",
"print a",
"exit",
NULL
};
const char *function2[] = {
"function function2",
"set b 10",
"print b",
// "exit", -- we don't want to exit here,
// as it would kill the thread that runs
// the main function
NULL
};
const char *main_code[] = {
"function main",
// Runs in new thread, asynchronously
"async function1",
// Executes synchronously in current thread
"sync function2",
"exit",
NULL
};
The implementation
The main
Let’s first update the src/main.c
file. We will simplify the API of the VM so the create_vm
takes an array of functions. That way all the functions will be loaded before the VM is starts.
Then we create a run_vm
function that will look up the main
function and execute it in a new thread.
// src/main.c
#include "utils/logger.h"
#include "core/vm.h"
#include <stdio.h>
int main(void) {
const char *function1[] = {
"function function1",
"set a 2",
"print a",
"exit",
NULL
};
const char *function2[] = {
"function function2",
"set b 10",
"print b",
NULL
};
const char *main_code[] = {
"function main",
// Runs in new thread, asynchronously
"async function1",
// Executes synchronously in current thread
"sync function2",
"exit",
NULL
};
print("[VM] Starting TinyVM...");
const char** functions[] = {
function1,
function2,
main_code,
NULL
};
VM *vm = create_vm(functions);
run_vm(vm);
destroy_vm(vm);
print("[VM] TinyVM finished.");
return 0;
}
Minor note: There is no intendation nor curly brackets after the function definition in the “tiny” language, which makes the code a bit difficult to read, but that is by design, as we don’t have a proper language grammer defined (at least not yet).
Function parsing
We will do a little refactoring that moves all the parsing logic into instruction.c
, which will separate the code parsing from execution.
// src/instruction/instruction.h
#ifndef TINY_VM_INSTRUCTION_H
#define TINY_VM_INSTRUCTION_H
typedef enum {
// ...
SYNC, // sync <function_name> - Execute function in current thread
ASYNC, // async <function_name> - Execute function in new thread
FUNCTION, // function <name> - Defines start of a function
} InstructionType;
typedef struct {
InstructionType type;
char args[3][32];
} Instruction;
Instruction parse_instruction(const char* line);
char* get_function_name(const char** code);
#endif
Here is the new way to handle instruction parsing.
// src/instruction/instruction.c
#include "instruction.h"
#include "../thread/thread.h"
#include "../memory/memory.h"
#include "../synchronization/synchronization.h"
#include "../utils/logger.h"
#include "../function/function.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
Instruction parse_instruction(const char* line) {
Instruction instr;
memset(&instr, 0, sizeof(Instruction));
char cmd[32];
sscanf(line, "%s", cmd);
if (strcmp(cmd, "print") == 0) {
instr.type = PRINT;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "set") == 0) {
instr.type = SET;
sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
}
else if (strcmp(cmd, "add") == 0) {
instr.type = ADD;
sscanf(line, "%s %s %s %s", cmd, instr.args[0], instr.args[1], instr.args[2]);
}
else if (strcmp(cmd, "sleep") == 0) {
instr.type = SLEEP;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "thread") == 0) {
instr.type = THREAD;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "exit") == 0) {
instr.type = EXIT;
}
else if (strcmp(cmd, "setshared") == 0) {
instr.type = SETSHARED;
sscanf(line, "%s %s %s", cmd, instr.args[0], instr.args[1]);
}
else if (strcmp(cmd, "lock") == 0) {
instr.type = LOCK;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "unlock") == 0) {
instr.type = UNLOCK;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "sync") == 0) {
instr.type = SYNC;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "async") == 0) {
instr.type = ASYNC;
sscanf(line, "%s %s", cmd, instr.args[0]);
}
else if (strcmp(cmd, "function") == 0) {
instr.type = FUNCTION;
sscanf(line, "%s %s", cmd, instr.args[0]); // args[0] will contain function name
}
return instr;
}
char* get_function_name(const char** code) {
if (!code || !code[0]) return NULL;
char cmd[32], name[32];
if (sscanf(code[0], "%s %s", cmd, name) == 2 && strcmp(cmd, "function") == 0) {
return strdup(name);
}
print("[VM] Error: Invalid function declaration: %s", code[0]);
return NULL;
}
We don’t need some instructions now, like
THREAD
, but let’s keep it there as we might need it in the future.
Function execution
Now we will add support for synchronous and asynchronous function execution.
We are going to create few new files to store all the execution logic.
// src/execution/execution.h
#ifndef TINY_VM_EXECUTION_H
#define TINY_VM_EXECUTION_H
#include "../types.h"
#include "../instruction/instruction.h"
void execute_instruction(ThreadContext* thread, Instruction* instr);
// synchronous and asynchronous function execution
void sync_function(ThreadContext* caller, const Function* function);
ThreadContext* async_function(VM* vm, const Function* function);
#endif
Here is the updated execution.c
file that will execute the functions in both sync and async way.
// src/execution/execution.c
#include "execution.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include "../types.h"
#include "../instruction/instruction.h"
#include "../thread/thread.h"
#include "../function/function.h"
#include "../utils/logger.h"
#include "../memory/memory.h"
#include "../synchronization/synchronization.h"
void execute_instruction(ThreadContext* thread, Instruction* instr) {
switch (instr->type) {
// ... the other cases...
case ADD: {
const jint val1 = get_value(thread, instr->args[1]);
const jint val2 = get_value(thread, instr->args[2]);
Variable* target = get_variable(thread, instr->args[0]);
if (target) {
target->value = val1 + val2;
}
// we will use ADD operation later on, let's add a print here
print("[Thread %d] ADD %d + %d = %d", thread->thread_id, val1, val2, target->value);
break;
}
// ...
case SYNC: {
Function* function = find_function(thread->vm, instr->args[0]);
if (function) {
sync_function(thread, function);
}
break;
}
case ASYNC: {
const Function* function = find_function(thread->vm, instr->args[0]);
if (function) {
async_function(thread->vm, function);
}
break;
}
case FUNCTION: {
// Function declaration is handled during loading
// No runtime execution needed
break;
}
default: ;
}
}
void sync_function(ThreadContext* caller, const Function* function) {
// Save caller's context
const int original_pc = caller->pc;
const char** original_program = caller->program;
char* original_function_name = caller->function_name;
// Set up function context
caller->program = function->code;
caller->pc = 0;
caller->function_name = strdup(function->name);
print("[Thread %d] Executing function '%s' synchronously", caller->thread_id, function->name);
// Execute function instructions
while (caller->is_running && caller->program[caller->pc] != NULL) {
Instruction instr = parse_instruction(caller->program[caller->pc]);
execute_instruction(caller, &instr);
caller->pc++;
}
// Restore caller's context
caller->program = original_program;
caller->pc = original_pc;
free(caller->function_name);
caller->function_name = original_function_name;
}
ThreadContext* async_function(VM* vm, const Function* function) {
pthread_mutex_lock(&vm->thread_mgmt_lock);
if (vm->thread_count >= vm->thread_capacity) {
pthread_mutex_unlock(&vm->thread_mgmt_lock);
return NULL;
}
ThreadContext* thread = &vm->threads[vm->thread_count++];
thread->local_scope = create_local_scope();
thread->program = function->code;
thread->pc = 0;
thread->is_running = 1;
thread->thread_id = vm->next_thread_id++;
thread->vm = vm;
thread->function_name = strdup(function->name);
print("[Thread %d] Starting function '%s' asynchronously", thread->thread_id, function->name);
pthread_create(&thread->thread, NULL, execute_thread_instructions, thread);
pthread_mutex_unlock(&vm->thread_mgmt_lock);
return thread;
}
We still have some parsing happening during the execution, which is by design, because we are not compiling the code (yet).
This is showing us few reasons why the JVM is compiling the code into byte-code:
- Parsing the source code during runtime is slowing down the execution
- Verification of the correcness of the code before execution improves stability of the code execution (and lowers the complexity of the VM code if we would mix the execution with parsing)
- And many others, but the points above are the obvious weak points in our VM implementation.
VM updates — function management
In this section, we are going to initialize the function management. The function management in our VM is a bit similar to the “method area” from JVM, but incridebely simplified.
// src/types.h
// ...
typedef struct Function {
char* name;
const char** code;
} Function;
// Thread context
typedef struct ThreadContext {
LocalScope* local_scope;
const char** program;
int pc;
pthread_t thread;
int thread_id;
int is_running;
struct VM* vm;
char* function_name;
} ThreadContext;
typedef struct VM {
// ...
// Function management
Function* functions;
int function_count;
int function_capacity;
pthread_mutex_t function_mgmt_lock;
} VM;
We also create a new load_function
function to store a function into the VM:
// src/function/function.c
void load_function(VM* vm, const char** code) {
char* name = get_function_name(code);
if (!name) {
return;
}
pthread_mutex_lock(&vm->function_mgmt_lock);
if (vm->function_count < vm->function_capacity) {
Function* function = &vm->functions[vm->function_count++];
function->name = name;
function->code = code;
print("[VM] Defined function: %s", name);
} else {
free(name);
print("[VM] Error: Function capacity exceeded");
}
pthread_mutex_unlock(&vm->function_mgmt_lock);
}
The next step is to update the VM functions, as discussed previously.
// src/core/vm.h
#ifndef TINY_VM_CORE_VM_H
#define TINY_VM_CORE_VM_H
#include "../types.h"
// Core VM functions
VM *create_vm(char** functions[]);
void run_vm(VM* vm);
void destroy_vm(VM *vm);
#endif
We are gonig to do a lot of updates in the src/vm.c
, so here is the whole file for easier understanding.
// src/core/vm.c
#include "vm.h"
#include "../execution/execution.h"
#include "../memory/memory.h"
#include "../function/function.h"
#include "../utils/logger.h"
#include <stdlib.h>
VM *create_vm(char** functions[]) {
VM *vm = malloc(sizeof(VM));
// Thread management
vm->thread_capacity = 10;
vm->thread_count = 0;
vm->threads = malloc(sizeof(ThreadContext) * vm->thread_capacity);
vm->next_thread_id = 0;
pthread_mutex_init(&vm->thread_mgmt_lock, NULL);
// Initialize heap
vm->heap_capacity = 10;
vm->heap_size = 0;
vm->heap = malloc(sizeof(Variable) * vm->heap_capacity);
pthread_mutex_init(&vm->heap_mgmt_lock, NULL);
// Initialize synchronization/mutex management
vm->lock_capacity = 10;
vm->lock_count = 0;
vm->locks = malloc(sizeof(SynchronizationLock) * vm->lock_capacity);
pthread_mutex_init(&vm->lock_mgmt_lock, NULL);
// Initialize function management
vm->function_capacity = 10;
vm->function_count = 0;
vm->functions = malloc(sizeof(Function) * vm->function_capacity);
pthread_mutex_init(&vm->function_mgmt_lock, NULL);
// Load functions
for (int i = 0; functions[i] != NULL; i++) {
load_function(vm, functions[i]);
}
return vm;
}
void run_vm(VM* vm) {
// Find main function
const Function* main_function = find_function(vm, "main");
if (!main_function) {
print("[VM] Error: No main function defined");
return;
}
// Start main in a new thread
async_function(vm, main_function);
// Wait for all threads to complete
for (int i = 0; i < vm->thread_count; i++) {
pthread_join(vm->threads[i].thread, NULL);
}
}
void destroy_vm(VM *vm) {
for (int i = 0; i < vm->thread_count; i++) {
destroy_local_scope(vm->threads[i].local_scope);
}
// Cleanup heap
for (int i = 0; i < vm->heap_size; i++) {
free(vm->heap[i].name);
}
free(vm->heap);
pthread_mutex_destroy(&vm->heap_mgmt_lock);
// Cleanup mutexes
for (int i = 0; i < vm->lock_count; i++) {
pthread_mutex_destroy(&vm->locks[i].mutex);
free(vm->locks[i].name);
}
pthread_mutex_destroy(&vm->lock_mgmt_lock);
free(vm->locks);
// Cleanup functions
for (int i = 0; i < vm->function_count; i++) {
free(vm->functions[i].name);
// Note that we don't free function->code because it points to string literals in our case - they are stored in the program's static memory. If we later change to dynamically allocate the code arrays, we would need to free those as well.
// free(vm->functions[i].code);
}
free(vm->functions);
pthread_mutex_destroy(&vm->function_mgmt_lock);
// Cleanup threads
free(vm->threads);
pthread_mutex_destroy(&vm->thread_mgmt_lock);
free(vm);
}
The
destroy_vm
function is getting big, but we need to do it. At least we can see what all is JVM’s garbage collection doing for us when we run code in Java.
Testing the functions
Let’s write a more comlex code that creates a global counter
variable, and then two async functions trying to modify it, but because the functions use a common lock, it is going to prevent the race condition, and counter
is reliably modified.
// src/main.c
#include "utils/logger.h"
#include "core/vm.h"
#include <stdio.h>
int main(void) {
const char *createCounter[] = {
"function createCounter",
"setshared counter 1000",
"print counter",
NULL
};
const char *incrementCounter[] = {
"function incrementCounter",
"lock counter_lock",
"set increment 10",
"add counter increment counter",
"print counter",
"unlock counter_lock",
"exit",
NULL
};
const char *decrementCounter[] = {
"function decrementCounter",
"lock counter_lock",
"set decrement -10",
"add counter decrement counter",
"print counter",
"unlock counter_lock",
"exit",
NULL
};
const char *main_code[] = {
"function main",
"sync createCounter",
"async incrementCounter",
"async decrementCounter",
"exit",
NULL
};
const char** functions[] = {
createCounter,
incrementCounter,
decrementCounter,
main_code,
NULL
};
print("[VM] Starting TinyVM...");
VM *vm = create_vm(functions);
run_vm(vm);
destroy_vm(vm);
print("[VM] TinyVM finished.");
return 0;
}
Here is the code output that shows how the counter
was successfully modified from our concurrently running functions.
[2025-01-01 11:01:24.818849] [VM] Starting TinyVM...
[2025-01-01 11:01:24.819476] [VM] Defined function: createCounter
[2025-01-01 11:01:24.819480] [VM] Defined function: incrementCounter
[2025-01-01 11:01:24.819481] [VM] Defined function: decrementCounter
[2025-01-01 11:01:24.819483] [VM] Defined function: main
[2025-01-01 11:01:24.819484] [Thread 0] Async function 'main' started
[2025-01-01 11:01:24.819508] [Thread 0] Thread instructions started
[2025-01-01 11:01:24.819511] [Thread 0] Sync function 'createCounter' started
[2025-01-01 11:01:24.819513] [Thread 0] Created shared variable counter
[2025-01-01 11:01:24.819515] [Thread 0] Set-shared counter = 1000
[2025-01-01 11:01:24.819516] [Thread 0] Found shared variable counter
[2025-01-01 11:01:24.819517] [Thread 0] PRINT Variable counter = 1000
[2025-01-01 11:01:24.819519] [Thread 1] Async function 'incrementCounter' started
[2025-01-01 11:01:24.819531] [Thread 2] Async function 'decrementCounter' started
[2025-01-01 11:01:24.819541] [Thread 0] Thread instructions finished
[2025-01-01 11:01:24.819542] [Thread 1] Thread instructions started
[2025-01-01 11:01:24.819548] [Thread 2] Thread instructions started
[2025-01-01 11:01:24.848462] [Thread 2] Waiting for lock 'counter_lock' at address 0x134f04408
[2025-01-01 11:01:24.848452] [Thread 1] Waiting for lock 'counter_lock' at address 0x134f04408
[2025-01-01 11:01:24.848466] [Thread 2] Acquired lock 'counter_lock'
[2025-01-01 11:01:24.848486] [Thread 2] Found shared variable counter
[2025-01-01 11:01:24.848490] [Thread 2] ADD -10 + 1000 = 990
[2025-01-01 11:01:24.848492] [Thread 2] PRINT Variable counter = 990
[2025-01-01 11:01:24.848496] [Thread 2] Released lock 'counter_lock' at address 0x134f04408
[2025-01-01 11:01:24.848499] [Thread 2] Thread instructions finished
[2025-01-01 11:01:24.848503] [Thread 1] Acquired lock 'counter_lock'
[2025-01-01 11:01:24.848519] [Thread 1] Found shared variable counter
[2025-01-01 11:01:24.848521] [Thread 1] ADD 10 + 1000 = 1010
[2025-01-01 11:01:24.848523] [Thread 1] PRINT Variable counter = 1010
[2025-01-01 11:01:24.848525] [Thread 1] Released lock 'counter_lock' at address 0x134f04408
[2025-01-01 11:01:24.848527] [Thread 1] Thread instructions finished
[2025-01-01 11:01:24.848549] [VM] TinyVM finished.
The complete source code for this article is available in the tiny-vm_06_functions directory of the TinyVM repository.
Next steps
Our current implementation has laid the groundwork for function execution, but as we’ve seen through the development process, there are several areas where we can improve performance, reliability, and functionality. Let’s explore what’s ahead.
Moving to Bytecode Compilation
While separating instruction parsing from execution was a good first step, parsing instructions at runtime still presents several challenges:
- Performance overhead from repeated parsing
- No pre-execution validation of code correctness
- Complex error handling during execution
- Difficulty in optimizing code
In the next part, we’ll address these issues by:
- Designing a simple bytecode format for our VM
- Creating a compiler to transform our text instructions into bytecode
- Implementing bytecode verification to catch errors early
- Building a bytecode interpreter for efficient execution
- Introduction
- Part 1 — Foundations
- Part 2 — Multithreading
- Part 3 — Heap
- Part 4 — Synchronized
- Part 5 — Refactoring
- Part 6 — Functions (you are here)
- Part 7 — Compilation
- Part 8 — Byte-code execution
- Part 9 — Function call stack (not started)
- Part 10 — Garbage collector (not started)