blu/lux

Files

Brandon Lucas f6569f1821 feat: implement early return handling for RC values

- Add pop_rc_scope_except() to skip decref'ing returned variables
- Block expressions now properly preserve returned RC variables
- Function returns skip cleanup for variables being returned
- Track function return types for call expression type inference
- Function calls returning RC types now register for cleanup
- Fix main() entry point to call main_lux() when present

Test result: [RC] No leaks: 17 allocs, 17 frees

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-14 13:53:28 -05:00

13 KiB

Raw Blame History

Reference Counting in Lux C Backend

Overview

This document describes the reference counting (RC) system for automatic memory management in the Lux C backend. The approach is inspired by Perceus (used in Koka) but starts with a simpler implementation.

Current Status: WORKING

The RC system is now functional for lists and boxed values.

What's Implemented

RC header structure (LuxRcHeader with refcount + type tag)
Allocation function (lux_rc_alloc)
Reference operations (lux_incref, lux_decref)
Polymorphic drop function (lux_drop)
Lists, boxed values, strings use RC allocation
List operations incref shared elements
Closures and environments - RC-managed with automatic cleanup
Inline lambda cleanup - temporary closures freed after use
ADT pointer fields - RC-allocated and cleaned up at scope exit
Scope tracking - compiler tracks RC variable lifetimes
Automatic decref at scope exit - variables are freed when out of scope
Memory tracking - debug mode reports allocs/frees at program exit
Early return handling - variables being returned from blocks/functions are not decref'd
Function call RC tracking - values from RC-returning functions are tracked for cleanup

Verified Working

[RC] No leaks: 14 allocs, 14 frees

What's NOT Yet Implemented

Conditional branch handling (complex if/else patterns)

The Problem

Currently generated code looks like this:

void example(LuxEvidence* ev) {
    LuxList* nums = lux_list_new(5);  // rc=1, allocated
    // ... use nums ...
    // MISSING: lux_decref(nums);  <- MEMORY LEAK!
}

It should look like this:

void example(LuxEvidence* ev) {
    LuxList* nums = lux_list_new(5);  // rc=1
    // ... use nums ...
    lux_decref(nums);  // rc=0, freed
}

Implementation Plan

Phase 1: Scope Tracking

Goal: Track which RC-managed variables are live at each point.

Data structures needed in CBackend:

struct CBackend {
    // ... existing fields ...

    /// Stack of scopes, each containing RC-managed variables
    /// Each scope is a Vec of (var_name, c_type, needs_decref)
    rc_scopes: Vec<Vec<RcVariable>>,
}

struct RcVariable {
    name: String,      // Variable name
    c_type: String,    // C type (for casting in decref)
    is_rc: bool,       // Whether this needs RC management
}

Operations:

push_scope() - Enter a new scope (function, block, etc.)
pop_scope() - Exit scope, emit decrefs for all live variables
register_rc_var(name, type) - Register a variable that needs RC management

Phase 2: Identify RC-Managed Types

Goal: Determine which types need RC management.

RC-managed types:

LuxList* - Lists
LuxString (when dynamically allocated) - Strings from concat/conversion
LuxClosure* - Closures
Boxed values (void* from lux_box_*)
ADT variants with pointer fields

NOT RC-managed:

LuxInt, LuxFloat, LuxBool - Stack-allocated primitives
String literals ("hello") - Static, not heap-allocated
LuxUnit - No data

Implementation:

fn is_rc_managed_type(&self, c_type: &str) -> bool {
    matches!(c_type,
        "LuxList*" | "LuxClosure*" | "LuxString" | "void*"
    ) || c_type.ends_with("*")  // Most pointer types are RC
}

fn needs_rc_for_expr(&self, expr: &Expr) -> bool {
    match expr {
        Expr::List { .. } => true,
        Expr::Lambda { .. } => true,
        Expr::StringConcat { .. } => true,
        Expr::Call { .. } => {
            // Check if function returns RC type
            self.returns_rc_type(func)
        }
        Expr::Literal(Literal::String(_)) => false,  // Static string
        Expr::Literal(_) => false,  // Primitives
        Expr::Var(_) => false,  // Using existing var, don't double-free
        _ => false,
    }
}

Phase 3: Emit Decrefs at Scope Exit

Goal: Insert lux_decref() calls when variables go out of scope.

For function bodies:

fn emit_function(&mut self, func: &Function) -> Result<(), CGenError> {
    self.push_scope();

    // ... emit function body ...

    // Before the closing brace, emit decrefs
    self.emit_scope_cleanup();
    self.pop_scope();
}

The cleanup function:

fn emit_scope_cleanup(&mut self) {
    if let Some(scope) = self.rc_scopes.last() {
        // Decref in reverse order (LIFO)
        for var in scope.iter().rev() {
            if var.is_rc {
                self.writeln(&format!("lux_decref({});", var.name));
            }
        }
    }
}

Phase 4: Handle Let Bindings

Goal: Register variables when they're bound.

fn emit_let(&mut self, name: &str, value: &Expr) -> Result<String, CGenError> {
    let c_type = self.infer_c_type(value)?;
    let value_code = self.emit_expr(value)?;

    self.writeln(&format!("{} {} = {};", c_type, name, value_code));

    // Register for cleanup if RC-managed
    if self.is_rc_managed_type(&c_type) && self.needs_rc_for_expr(value) {
        self.register_rc_var(name, &c_type);
    }

    Ok(name.to_string())
}

Phase 5: Handle Early Returns

Goal: Decref all live variables before returning.

fn emit_return(&mut self, value: &Expr) -> Result<String, CGenError> {
    let return_val = self.emit_expr(value)?;

    // Store return value in temp if it's an RC variable we're about to decref
    let temp_needed = self.is_rc_managed_type(&self.infer_c_type(value)?);

    if temp_needed {
        self.writeln(&format!("void* _ret_tmp = {};", return_val));
        self.writeln("lux_incref(_ret_tmp);");  // Keep it alive
    }

    // Decref all scopes from innermost to outermost
    for scope in self.rc_scopes.iter().rev() {
        for var in scope.iter().rev() {
            if var.is_rc {
                self.writeln(&format!("lux_decref({});", var.name));
            }
        }
    }

    if temp_needed {
        self.writeln("return _ret_tmp;");
    } else {
        self.writeln(&format!("return {};", return_val));
    }

    Ok(String::new())
}

Phase 6: Handle Conditionals

Goal: Properly handle if/else where both branches may define variables.

For if/else expressions that create RC values:

// Before (leaks):
LuxList* result = (condition ? create_list_a() : create_list_b());

// After (no leak):
LuxList* result;
if (condition) {
    result = create_list_a();
} else {
    result = create_list_b();
}
// Only one path executed, only one allocation

This requires changing if/else from ternary expressions to proper if statements.

Phase 7: Handle Blocks

Goal: Each block { ... } creates a new scope.

fn emit_block(&mut self, statements: &[Statement]) -> Result<String, CGenError> {
    self.push_scope();
    self.writeln("{");
    self.indent += 1;

    let mut last_value = String::from("NULL");
    for stmt in statements {
        last_value = self.emit_statement(stmt)?;
    }

    // Cleanup before leaving block
    self.emit_scope_cleanup();

    self.indent -= 1;
    self.writeln("}");
    self.pop_scope();

    Ok(last_value)
}

Testing Strategy

Unit Tests

Simple allocation and free:

fn test(): Unit = {
    let x = [1, 2, 3]  // Should be freed at end
}

Nested scopes:

fn test(): Unit = {
    let outer = [1]
    {
        let inner = [2]  // Freed here
    }
    // outer still live
}  // outer freed here

Early return:

fn test(b: Bool): List<Int> = {
    let x = [1, 2, 3]
    if b then return []  // x must be freed before return
    x
}

Conditionals:

fn test(b: Bool): List<Int> = {
    let x = if b then [1] else [2]  // Only one allocated
    x
}

Memory Leak Detection

Use valgrind (if available) or add debug tracking:

static int64_t lux_alloc_count = 0;
static int64_t lux_free_count = 0;

static void* lux_rc_alloc(size_t size, int32_t tag) {
    lux_alloc_count++;
    // ... existing code ...
}

static void lux_drop(void* ptr, int32_t tag) {
    lux_free_count++;
    // ... existing code ...
}

// At program exit:
void lux_check_leaks() {
    if (lux_alloc_count != lux_free_count) {
        fprintf(stderr, "LEAK: %lld allocations, %lld frees\n",
                lux_alloc_count, lux_free_count);
    }
}

Comparison with Perceus

Feature	Perceus (Koka)	Lux RC (Current)
RC header	Yes	Yes ✅
Scope tracking	Yes	Yes ✅
Auto decref	Yes	Yes ✅
Memory tracking	No	Yes ✅ (debug)
Early return	Yes	Partial
Last-use opt	Yes	No
Reuse (FBIP)	Yes	No
Drop fusion	Yes	No

Files to Modify

File	Changes
`src/codegen/c_backend.rs`	Add scope tracking, emit decrefs

Estimated Complexity

Scope tracking data structures: ~30 lines
Type classification: ~40 lines
Scope cleanup emission: ~30 lines
Let binding registration: ~20 lines
Early return handling: ~40 lines
Block scope handling: ~30 lines
Testing: ~100 lines

Total: ~300 lines of careful implementation

Path to Koka/Rust Parity

What We Have Now (Basic RC)

Our current implementation provides:

Deterministic cleanup - Memory freed at predictable points (scope exit)
No GC pauses - Unlike Go/Java, latency is predictable
Leak detection - Debug mode catches memory leaks during development
No manual management - Unlike C/Zig, programmer doesn't call free()

What Koka Has (Perceus RC)

Koka's Perceus system adds several optimizations we don't have:

Feature	Description	Benefit	Complexity
Last-use analysis	Detect when a variable's final use allows ownership transfer	Avoid unnecessary copies	Medium
Reuse (FBIP)	When rc=1, mutate in-place instead of copy	Major performance boost	High
Drop specialization	Generate type-specific drop instead of polymorphic	Fewer branches, faster	Low
Drop fusion	Combine multiple consecutive drops	Fewer function calls	Medium
Borrow inference	Avoid incref when borrowing temporaries	Reduce RC overhead	High

What Rust Has (Ownership)

Rust's ownership system is fundamentally different:

Aspect	Rust	Lux RC	Tradeoff
When checked	Compile-time	Runtime	Rust catches bugs earlier
Runtime cost	Zero	RC operations	Rust is faster
Learning curve	Steep (borrow checker)	Gentle	Lux is easier to learn
Expressiveness	Limited by lifetimes	Unrestricted	Lux is more flexible
Cycles	Prevented by design	Would leak	Rust handles more patterns

Key insight: We can never match Rust's zero-overhead guarantees because ownership is checked at compile time. RC always has runtime cost. But we can be as good as Koka.

Remaining Work for Full Memory Safety

Phase A: Complete Coverage (Prevent All Leaks)

~~Closure RC~~ ✅ DONE - Environments are now RC-managed
- Closures allocated with lux_rc_alloc(sizeof(LuxClosure), LUX_TAG_CLOSURE)
- Environments allocated with lux_rc_alloc(sizeof(LuxEnv_N), LUX_TAG_ENV)
- Inline lambdas freed after use in List operations
~~ADT RC~~ ✅ DONE - Algebraic data types with heap fields
- Track which variants contain RC fields
- Generate drop functions for each ADT
- ~100 lines
~~Early return handling~~ ✅ DONE - Cleanup all scopes on return
- Variables being returned are skipped during scope cleanup
- Function calls returning RC types are tracked for cleanup
- Blocks properly handle returning RC variables
Complex conditionals - If/else creating RC values
- Switch from ternary to if-statements
- Track RC creation in branches
- ~50 lines

Phase B: Performance Optimizations (Match Koka)

Last-use optimization
- Track variable liveness
- Skip incref on last use (transfer ownership)
- Requires dataflow analysis
- ~200 lines
Reuse analysis (FBIP)
- Detect rc=1 at update sites
- Mutate in-place instead of copy
- Major change to list operations
- ~300 lines
Drop specialization
- Generate per-type drop functions
- Eliminate polymorphic dispatch
- ~100 lines

Estimated Effort

Phase	Description	Lines	Priority	Status
A1	Closure RC	~50	P0	✅ Done
A2	ADT RC	~150	P1	✅ Done
A3	Early returns	~30	P1	✅ Done
A4	Conditionals	~50	P2 - Uncommon	Pending
B1	Last-use opt	~200	P3 - Performance	Pending
B2	Reuse (FBIP)	~300	P3 - Performance	Pending
B3	Drop special	~100	P3 - Performance	Pending

Phase A remaining: ~50 lines - Gets us to "no leaks" Phase B total: ~600 lines - Gets us to Koka-level performance

Cycle Detection

RC cannot handle cycles (A → B → A). Options:

Ignore - Cycles are rare in functional code (our current approach)
Weak references - Programmer marks back-edges
Cycle collector - Periodic scan for cycles (adds GC-like pauses)

Koka also ignores cycles, relying on functional programming's natural acyclicity.

13 KiB Raw Blame History