feat: add list support to C backend and improve compile workflow

C Backend Lists:
- Add LuxList type (dynamic array with void* boxing)
- Implement all 16 list operations: length, isEmpty, concat, reverse,
  range, take, drop, head, tail, get, map, filter, fold, find, any, all
- Higher-order operations generate inline loops with closure calls
- Fix unique variable names to prevent redefinition errors

Compile Command:
- `lux compile file.lux` now produces a binary (like rustc, go build)
- Add `--emit-c` flag to output C code instead
- Binary name derived from source filename (foo.lux -> ./foo)
- Clean up temp files after compilation

Documentation:
- Create docs/C_BACKEND.md with full strategy documentation
- Document compilation pipeline, runtime types, limitations
- Compare with Koka, Rust, Zig, Go, Nim, OCaml approaches
- Outline future roadmap (evidence passing, Perceus RC)
- Fix misleading doc comment (remove false Perceus claim)
- Update OVERVIEW.md and ROADMAP.md to reflect list completion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-14 11:02:26 -05:00
parent d284ee58a8
commit 909dbf7a97
5 changed files with 954 additions and 87 deletions

399
docs/C_BACKEND.md Normal file
View File

@@ -0,0 +1,399 @@
# Lux C Backend
## Overview
Lux compiles to C code, then invokes a system C compiler (gcc/clang) to produce native binaries. This approach is used by several production languages:
| Language | Target | Memory Management |
|----------|--------|-------------------|
| **Koka** | C | Perceus reference counting |
| **Nim** | C | ORC (configurable) |
| **Chicken Scheme** | C | Generational GC |
| **Lux (current)** | C | None (leaks) |
## Compilation Pipeline
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Lux Source │ ──► │ Parser │ ──► │ Type Check │ ──► │ C Codegen │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Binary │ ◄── │ cc/gcc/ │ ◄── │ Temp .c │ ◄───│ C Code │
│ │ │ clang │ │ File │ │ (string) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
```
**Usage:**
```bash
lux compile foo.lux # Produces ./foo binary
lux compile foo.lux -o app # Produces ./app binary
lux compile foo.lux --run # Compile and execute
lux compile foo.lux --emit-c # Output C code (for debugging)
```
## Runtime Type Representations
### Primitive Types
```c
typedef int64_t LuxInt;
typedef double LuxFloat;
typedef bool LuxBool;
typedef char* LuxString;
typedef void* LuxUnit;
```
### Closures
Closures are represented as a pair of environment pointer and function pointer:
```c
typedef struct {
void* env; // Pointer to captured variables
void* fn_ptr; // Pointer to the function
} LuxClosure;
```
**Example - capturing a variable:**
```lux
let multiplier = 3
let triple = fn(x: Int): Int => x * multiplier
```
Generates:
```c
// Environment struct for captured variables
typedef struct {
LuxInt multiplier;
} Env_triple;
// The lambda function
LuxInt lambda_triple(void* _env, LuxInt x) {
Env_triple* env = (Env_triple*)_env;
return x * env->multiplier;
}
// Creating the closure
Env_triple* env = malloc(sizeof(Env_triple));
env->multiplier = multiplier;
LuxClosure* triple = malloc(sizeof(LuxClosure));
triple->env = env;
triple->fn_ptr = (void*)lambda_triple;
```
### Algebraic Data Types (ADTs)
ADTs compile to tagged unions:
```lux
type Option =
| Some(Int)
| None
```
Generates:
```c
typedef enum { Option_TAG_SOME, Option_TAG_NONE } Option_Tag;
typedef struct {
Option_Tag tag;
union {
struct { LuxInt field0; } some;
// None has no fields
} data;
} Option;
```
**Pattern matching** compiles to if/else chains:
```lux
match opt {
Some(x) => x,
None => 0
}
```
Generates:
```c
if (opt.tag == Option_TAG_SOME) {
LuxInt x = opt.data.some.field0;
result = x;
} else if (opt.tag == Option_TAG_NONE) {
result = 0;
}
```
### Lists
Lists are dynamic arrays with boxed elements:
```c
typedef struct {
void** elements; // Array of boxed elements
int64_t length;
int64_t capacity;
} LuxList;
```
Elements are boxed/unboxed at access time:
```c
void* lux_box_int(LuxInt n) {
LuxInt* p = malloc(sizeof(LuxInt));
*p = n;
return p;
}
LuxInt lux_unbox_int(void* p) {
return *(LuxInt*)p;
}
```
**List operations** (map, filter, fold, etc.) generate inline loops:
```c
// List.map(nums, fn(x) => x * 2)
LuxList* result = lux_list_new(nums->length);
for (int64_t i = 0; i < nums->length; i++) {
void* elem = nums->elements[i];
LuxInt mapped = ((LuxInt(*)(void*, LuxInt))fn->fn_ptr)(fn->env, lux_unbox_int(elem));
result->elements[i] = lux_box_int(mapped);
}
result->length = nums->length;
```
## Current Limitations
### 1. Memory Leaks
**Everything allocated is never freed.** This includes:
- Closure environments
- ADT values
- List elements and arrays
- Strings from concatenation
This is acceptable for short-lived programs but not for long-running services.
### 2. Limited Effects
Only `Console.print` is supported, hardcoded to `printf`:
```c
static void lux_console_print(LuxString msg) {
printf("%s\n", msg);
}
```
Other effects (File, Http, Random, etc.) are not yet implemented in the C backend.
### 3. If/Else Side Effects
The C backend uses ternary operators for if/else:
```c
(condition ? then_value : else_value)
```
**Problem:** If branches contain side effects (like `Console.print`), both branches are evaluated during code generation, causing both to execute.
**Workaround:** Use pure expressions in if/else branches, then print the result:
```lux
// Bad - both prints execute
if x > 0 then Console.print("positive") else Console.print("negative")
// Good - only one print
let msg = if x > 0 then "positive" else "negative"
Console.print(msg)
```
---
## Comparison with Other Languages
### Koka (Our Inspiration)
Koka also compiles to C with algebraic effects. Key differences:
| Aspect | Koka | Lux (current) |
|--------|------|---------------|
| Memory | Perceus RC | Leaks |
| Effects | Evidence passing (zero-cost) | Runtime lookup |
| Closures | Environment vectors | Heap-allocated structs |
| Maturity | Production-ready | Experimental |
### Rust
| Aspect | Rust | Lux |
|--------|------|-----|
| Target | LLVM | C |
| Memory | Ownership/borrowing | Leaks |
| Safety | Compile-time guaranteed | Runtime (interpreter) |
| Learning curve | Steep | Medium |
### Zig
| Aspect | Zig | Lux |
|--------|-----|-----|
| Target | LLVM | C |
| Memory | Manual with allocators | Leaks |
| Philosophy | Explicit control | High-level abstraction |
### Go
| Aspect | Go | Lux |
|--------|-----|-----|
| Target | Native | C |
| Memory | Concurrent GC | Leaks |
| Effects | None | Algebraic effects |
| Latency | Unpredictable (GC pauses) | Predictable (no GC) |
---
## Future Roadmap
### Phase 1: Evidence Passing (Zero-Cost Effects)
**Goal:** Eliminate runtime effect handler lookup.
**Current approach (slow):**
```rust
// O(n) search through handler stack
for handler in self.handler_stack.iter().rev() {
if handler.effect == request.effect {
return handler.invoke(request);
}
}
```
**Evidence passing (fast):**
```c
typedef struct {
Console* console;
FileIO* fileio;
} Evidence;
void greet(Evidence* ev, const char* name) {
ev->console->print(ev, name); // Direct call, no search
}
```
**Expected speedup:** 10-20x for effect-heavy code.
### Phase 2: Perceus Reference Counting
**Goal:** Deterministic memory management without GC pauses.
Perceus is a compile-time reference counting system that:
1. Inserts increment/decrement at precise points
2. Detects when values can be reused in-place (FBIP)
3. Guarantees no memory leaks without runtime GC
**Example - reuse analysis:**
```lux
fn increment(xs: List<Int>): List<Int> =
List.map(xs, fn(x) => x + 1)
```
If `xs` has refcount=1, the list can be mutated in-place instead of copied.
### Phase 3: More Effects
Implement C versions of:
- `File` (read, write, exists)
- `Http` (get, post)
- `Random` (int, bool)
- `Time` (now, sleep)
### Phase 4: JavaScript Backend
Compile Lux to JavaScript for browser/Node.js:
- Effects → Direct DOM/API calls
- No runtime needed
- Enables full-stack Lux development
---
## Implementation Details
### Name Mangling
Lux identifiers are mangled for C compatibility:
| Lux | C |
|-----|---|
| `foo` | `foo_lux` |
| `myFunction` | `myFunction_lux` |
| `List.map` | Inline code (not a function call) |
### Generated C Structure
```c
// 1. Includes and type definitions
#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef int64_t LuxInt;
// ... more types ...
// 2. Runtime helpers (string concat, list operations, etc.)
static LuxString lux_string_concat(LuxString a, LuxString b) { ... }
static LuxList* lux_list_new(int64_t capacity) { ... }
// ... more helpers ...
// 3. Forward declarations
void main_lux(void);
// 4. Closure/lambda definitions
static LuxInt lambda_1(void* _env, LuxInt x) { ... }
// 5. User-defined functions
void greet_lux(LuxString name) { ... }
// 6. Main function
void main_lux(void) { ... }
// 7. Entry point
int main(int argc, char** argv) {
main_lux();
return 0;
}
```
### Prelude Size
The generated C prelude is approximately 150 lines, including:
- Type definitions (~20 lines)
- String operations (~30 lines)
- List types and operations (~80 lines)
- Boxing/unboxing helpers (~20 lines)
---
## Testing the C Backend
```bash
# Compile and run
lux compile examples/hello.lux --run
# Compile to binary
lux compile examples/hello.lux -o hello
./hello
# View generated C (for debugging)
lux compile examples/hello.lux --emit-c
# Save C to file
lux compile examples/hello.lux --emit-c -o hello.c
```
---
## References
- [Perceus: Garbage Free Reference Counting](https://www.microsoft.com/en-us/research/publication/perceus-garbage-free-reference-counting-with-reuse/) - Microsoft Research
- [Generalized Evidence Passing for Effect Handlers](https://www.microsoft.com/en-us/research/publication/generalized-evidence-passing-for-effect-handlers/) - Koka's effect compilation
- [Koka Language](https://koka-lang.github.io/koka/doc/book.html) - Effect system language that compiles to C
- [Nim Backend Integration](https://nim-lang.org/docs/backends.html) - Another compile-to-C language