Files
lux/docs/C_BACKEND.md
Brandon Lucas 909dbf7a97 feat: add list support to C backend and improve compile workflow
C Backend Lists:
- Add LuxList type (dynamic array with void* boxing)
- Implement all 16 list operations: length, isEmpty, concat, reverse,
  range, take, drop, head, tail, get, map, filter, fold, find, any, all
- Higher-order operations generate inline loops with closure calls
- Fix unique variable names to prevent redefinition errors

Compile Command:
- `lux compile file.lux` now produces a binary (like rustc, go build)
- Add `--emit-c` flag to output C code instead
- Binary name derived from source filename (foo.lux -> ./foo)
- Clean up temp files after compilation

Documentation:
- Create docs/C_BACKEND.md with full strategy documentation
- Document compilation pipeline, runtime types, limitations
- Compare with Koka, Rust, Zig, Go, Nim, OCaml approaches
- Outline future roadmap (evidence passing, Perceus RC)
- Fix misleading doc comment (remove false Perceus claim)
- Update OVERVIEW.md and ROADMAP.md to reflect list completion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-14 11:02:26 -05:00

9.8 KiB

Lux C Backend

Overview

Lux compiles to C code, then invokes a system C compiler (gcc/clang) to produce native binaries. This approach is used by several production languages:

Language Target Memory Management
Koka C Perceus reference counting
Nim C ORC (configurable)
Chicken Scheme C Generational GC
Lux (current) C None (leaks)

Compilation Pipeline

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Lux Source  │ ──► │   Parser    │ ──► │ Type Check  │ ──► │  C Codegen  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                                                                   │
                                                                   ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Binary    │ ◄── │  cc/gcc/    │ ◄── │  Temp .c    │ ◄───│  C Code     │
│             │     │  clang      │     │  File       │     │  (string)   │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Usage:

lux compile foo.lux           # Produces ./foo binary
lux compile foo.lux -o app    # Produces ./app binary
lux compile foo.lux --run     # Compile and execute
lux compile foo.lux --emit-c  # Output C code (for debugging)

Runtime Type Representations

Primitive Types

typedef int64_t LuxInt;
typedef double LuxFloat;
typedef bool LuxBool;
typedef char* LuxString;
typedef void* LuxUnit;

Closures

Closures are represented as a pair of environment pointer and function pointer:

typedef struct {
    void* env;      // Pointer to captured variables
    void* fn_ptr;   // Pointer to the function
} LuxClosure;

Example - capturing a variable:

let multiplier = 3
let triple = fn(x: Int): Int => x * multiplier

Generates:

// Environment struct for captured variables
typedef struct {
    LuxInt multiplier;
} Env_triple;

// The lambda function
LuxInt lambda_triple(void* _env, LuxInt x) {
    Env_triple* env = (Env_triple*)_env;
    return x * env->multiplier;
}

// Creating the closure
Env_triple* env = malloc(sizeof(Env_triple));
env->multiplier = multiplier;
LuxClosure* triple = malloc(sizeof(LuxClosure));
triple->env = env;
triple->fn_ptr = (void*)lambda_triple;

Algebraic Data Types (ADTs)

ADTs compile to tagged unions:

type Option =
    | Some(Int)
    | None

Generates:

typedef enum { Option_TAG_SOME, Option_TAG_NONE } Option_Tag;

typedef struct {
    Option_Tag tag;
    union {
        struct { LuxInt field0; } some;
        // None has no fields
    } data;
} Option;

Pattern matching compiles to if/else chains:

match opt {
    Some(x) => x,
    None => 0
}

Generates:

if (opt.tag == Option_TAG_SOME) {
    LuxInt x = opt.data.some.field0;
    result = x;
} else if (opt.tag == Option_TAG_NONE) {
    result = 0;
}

Lists

Lists are dynamic arrays with boxed elements:

typedef struct {
    void** elements;   // Array of boxed elements
    int64_t length;
    int64_t capacity;
} LuxList;

Elements are boxed/unboxed at access time:

void* lux_box_int(LuxInt n) {
    LuxInt* p = malloc(sizeof(LuxInt));
    *p = n;
    return p;
}

LuxInt lux_unbox_int(void* p) {
    return *(LuxInt*)p;
}

List operations (map, filter, fold, etc.) generate inline loops:

// List.map(nums, fn(x) => x * 2)
LuxList* result = lux_list_new(nums->length);
for (int64_t i = 0; i < nums->length; i++) {
    void* elem = nums->elements[i];
    LuxInt mapped = ((LuxInt(*)(void*, LuxInt))fn->fn_ptr)(fn->env, lux_unbox_int(elem));
    result->elements[i] = lux_box_int(mapped);
}
result->length = nums->length;

Current Limitations

1. Memory Leaks

Everything allocated is never freed. This includes:

  • Closure environments
  • ADT values
  • List elements and arrays
  • Strings from concatenation

This is acceptable for short-lived programs but not for long-running services.

2. Limited Effects

Only Console.print is supported, hardcoded to printf:

static void lux_console_print(LuxString msg) {
    printf("%s\n", msg);
}

Other effects (File, Http, Random, etc.) are not yet implemented in the C backend.

3. If/Else Side Effects

The C backend uses ternary operators for if/else:

(condition ? then_value : else_value)

Problem: If branches contain side effects (like Console.print), both branches are evaluated during code generation, causing both to execute.

Workaround: Use pure expressions in if/else branches, then print the result:

// Bad - both prints execute
if x > 0 then Console.print("positive") else Console.print("negative")

// Good - only one print
let msg = if x > 0 then "positive" else "negative"
Console.print(msg)

Comparison with Other Languages

Koka (Our Inspiration)

Koka also compiles to C with algebraic effects. Key differences:

Aspect Koka Lux (current)
Memory Perceus RC Leaks
Effects Evidence passing (zero-cost) Runtime lookup
Closures Environment vectors Heap-allocated structs
Maturity Production-ready Experimental

Rust

Aspect Rust Lux
Target LLVM C
Memory Ownership/borrowing Leaks
Safety Compile-time guaranteed Runtime (interpreter)
Learning curve Steep Medium

Zig

Aspect Zig Lux
Target LLVM C
Memory Manual with allocators Leaks
Philosophy Explicit control High-level abstraction

Go

Aspect Go Lux
Target Native C
Memory Concurrent GC Leaks
Effects None Algebraic effects
Latency Unpredictable (GC pauses) Predictable (no GC)

Future Roadmap

Phase 1: Evidence Passing (Zero-Cost Effects)

Goal: Eliminate runtime effect handler lookup.

Current approach (slow):

// O(n) search through handler stack
for handler in self.handler_stack.iter().rev() {
    if handler.effect == request.effect {
        return handler.invoke(request);
    }
}

Evidence passing (fast):

typedef struct {
    Console* console;
    FileIO* fileio;
} Evidence;

void greet(Evidence* ev, const char* name) {
    ev->console->print(ev, name);  // Direct call, no search
}

Expected speedup: 10-20x for effect-heavy code.

Phase 2: Perceus Reference Counting

Goal: Deterministic memory management without GC pauses.

Perceus is a compile-time reference counting system that:

  1. Inserts increment/decrement at precise points
  2. Detects when values can be reused in-place (FBIP)
  3. Guarantees no memory leaks without runtime GC

Example - reuse analysis:

fn increment(xs: List<Int>): List<Int> =
    List.map(xs, fn(x) => x + 1)

If xs has refcount=1, the list can be mutated in-place instead of copied.

Phase 3: More Effects

Implement C versions of:

  • File (read, write, exists)
  • Http (get, post)
  • Random (int, bool)
  • Time (now, sleep)

Phase 4: JavaScript Backend

Compile Lux to JavaScript for browser/Node.js:

  • Effects → Direct DOM/API calls
  • No runtime needed
  • Enables full-stack Lux development

Implementation Details

Name Mangling

Lux identifiers are mangled for C compatibility:

Lux C
foo foo_lux
myFunction myFunction_lux
List.map Inline code (not a function call)

Generated C Structure

// 1. Includes and type definitions
#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef int64_t LuxInt;
// ... more types ...

// 2. Runtime helpers (string concat, list operations, etc.)
static LuxString lux_string_concat(LuxString a, LuxString b) { ... }
static LuxList* lux_list_new(int64_t capacity) { ... }
// ... more helpers ...

// 3. Forward declarations
void main_lux(void);

// 4. Closure/lambda definitions
static LuxInt lambda_1(void* _env, LuxInt x) { ... }

// 5. User-defined functions
void greet_lux(LuxString name) { ... }

// 6. Main function
void main_lux(void) { ... }

// 7. Entry point
int main(int argc, char** argv) {
    main_lux();
    return 0;
}

Prelude Size

The generated C prelude is approximately 150 lines, including:

  • Type definitions (~20 lines)
  • String operations (~30 lines)
  • List types and operations (~80 lines)
  • Boxing/unboxing helpers (~20 lines)

Testing the C Backend

# Compile and run
lux compile examples/hello.lux --run

# Compile to binary
lux compile examples/hello.lux -o hello
./hello

# View generated C (for debugging)
lux compile examples/hello.lux --emit-c

# Save C to file
lux compile examples/hello.lux --emit-c -o hello.c

References