Files
lux/docs/REFERENCE_COUNTING.md
Brandon Lucas f6569f1821 feat: implement early return handling for RC values
- Add pop_rc_scope_except() to skip decref'ing returned variables
- Block expressions now properly preserve returned RC variables
- Function returns skip cleanup for variables being returned
- Track function return types for call expression type inference
- Function calls returning RC types now register for cleanup
- Fix main() entry point to call main_lux() when present

Test result: [RC] No leaks: 17 allocs, 17 frees

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-14 13:53:28 -05:00

13 KiB

Reference Counting in Lux C Backend

Overview

This document describes the reference counting (RC) system for automatic memory management in the Lux C backend. The approach is inspired by Perceus (used in Koka) but starts with a simpler implementation.

Current Status: WORKING

The RC system is now functional for lists and boxed values.

What's Implemented

  • RC header structure (LuxRcHeader with refcount + type tag)
  • Allocation function (lux_rc_alloc)
  • Reference operations (lux_incref, lux_decref)
  • Polymorphic drop function (lux_drop)
  • Lists, boxed values, strings use RC allocation
  • List operations incref shared elements
  • Closures and environments - RC-managed with automatic cleanup
  • Inline lambda cleanup - temporary closures freed after use
  • ADT pointer fields - RC-allocated and cleaned up at scope exit
  • Scope tracking - compiler tracks RC variable lifetimes
  • Automatic decref at scope exit - variables are freed when out of scope
  • Memory tracking - debug mode reports allocs/frees at program exit
  • Early return handling - variables being returned from blocks/functions are not decref'd
  • Function call RC tracking - values from RC-returning functions are tracked for cleanup

Verified Working

[RC] No leaks: 14 allocs, 14 frees

What's NOT Yet Implemented

  • Conditional branch handling (complex if/else patterns)

The Problem

Currently generated code looks like this:

void example(LuxEvidence* ev) {
    LuxList* nums = lux_list_new(5);  // rc=1, allocated
    // ... use nums ...
    // MISSING: lux_decref(nums);  <- MEMORY LEAK!
}

It should look like this:

void example(LuxEvidence* ev) {
    LuxList* nums = lux_list_new(5);  // rc=1
    // ... use nums ...
    lux_decref(nums);  // rc=0, freed
}

Implementation Plan

Phase 1: Scope Tracking

Goal: Track which RC-managed variables are live at each point.

Data structures needed in CBackend:

struct CBackend {
    // ... existing fields ...

    /// Stack of scopes, each containing RC-managed variables
    /// Each scope is a Vec of (var_name, c_type, needs_decref)
    rc_scopes: Vec<Vec<RcVariable>>,
}

struct RcVariable {
    name: String,      // Variable name
    c_type: String,    // C type (for casting in decref)
    is_rc: bool,       // Whether this needs RC management
}

Operations:

  • push_scope() - Enter a new scope (function, block, etc.)
  • pop_scope() - Exit scope, emit decrefs for all live variables
  • register_rc_var(name, type) - Register a variable that needs RC management

Phase 2: Identify RC-Managed Types

Goal: Determine which types need RC management.

RC-managed types:

  • LuxList* - Lists
  • LuxString (when dynamically allocated) - Strings from concat/conversion
  • LuxClosure* - Closures
  • Boxed values (void* from lux_box_*)
  • ADT variants with pointer fields

NOT RC-managed:

  • LuxInt, LuxFloat, LuxBool - Stack-allocated primitives
  • String literals ("hello") - Static, not heap-allocated
  • LuxUnit - No data

Implementation:

fn is_rc_managed_type(&self, c_type: &str) -> bool {
    matches!(c_type,
        "LuxList*" | "LuxClosure*" | "LuxString" | "void*"
    ) || c_type.ends_with("*")  // Most pointer types are RC
}

fn needs_rc_for_expr(&self, expr: &Expr) -> bool {
    match expr {
        Expr::List { .. } => true,
        Expr::Lambda { .. } => true,
        Expr::StringConcat { .. } => true,
        Expr::Call { .. } => {
            // Check if function returns RC type
            self.returns_rc_type(func)
        }
        Expr::Literal(Literal::String(_)) => false,  // Static string
        Expr::Literal(_) => false,  // Primitives
        Expr::Var(_) => false,  // Using existing var, don't double-free
        _ => false,
    }
}

Phase 3: Emit Decrefs at Scope Exit

Goal: Insert lux_decref() calls when variables go out of scope.

For function bodies:

fn emit_function(&mut self, func: &Function) -> Result<(), CGenError> {
    self.push_scope();

    // ... emit function body ...

    // Before the closing brace, emit decrefs
    self.emit_scope_cleanup();
    self.pop_scope();
}

The cleanup function:

fn emit_scope_cleanup(&mut self) {
    if let Some(scope) = self.rc_scopes.last() {
        // Decref in reverse order (LIFO)
        for var in scope.iter().rev() {
            if var.is_rc {
                self.writeln(&format!("lux_decref({});", var.name));
            }
        }
    }
}

Phase 4: Handle Let Bindings

Goal: Register variables when they're bound.

fn emit_let(&mut self, name: &str, value: &Expr) -> Result<String, CGenError> {
    let c_type = self.infer_c_type(value)?;
    let value_code = self.emit_expr(value)?;

    self.writeln(&format!("{} {} = {};", c_type, name, value_code));

    // Register for cleanup if RC-managed
    if self.is_rc_managed_type(&c_type) && self.needs_rc_for_expr(value) {
        self.register_rc_var(name, &c_type);
    }

    Ok(name.to_string())
}

Phase 5: Handle Early Returns

Goal: Decref all live variables before returning.

fn emit_return(&mut self, value: &Expr) -> Result<String, CGenError> {
    let return_val = self.emit_expr(value)?;

    // Store return value in temp if it's an RC variable we're about to decref
    let temp_needed = self.is_rc_managed_type(&self.infer_c_type(value)?);

    if temp_needed {
        self.writeln(&format!("void* _ret_tmp = {};", return_val));
        self.writeln("lux_incref(_ret_tmp);");  // Keep it alive
    }

    // Decref all scopes from innermost to outermost
    for scope in self.rc_scopes.iter().rev() {
        for var in scope.iter().rev() {
            if var.is_rc {
                self.writeln(&format!("lux_decref({});", var.name));
            }
        }
    }

    if temp_needed {
        self.writeln("return _ret_tmp;");
    } else {
        self.writeln(&format!("return {};", return_val));
    }

    Ok(String::new())
}

Phase 6: Handle Conditionals

Goal: Properly handle if/else where both branches may define variables.

For if/else expressions that create RC values:

// Before (leaks):
LuxList* result = (condition ? create_list_a() : create_list_b());

// After (no leak):
LuxList* result;
if (condition) {
    result = create_list_a();
} else {
    result = create_list_b();
}
// Only one path executed, only one allocation

This requires changing if/else from ternary expressions to proper if statements.

Phase 7: Handle Blocks

Goal: Each block { ... } creates a new scope.

fn emit_block(&mut self, statements: &[Statement]) -> Result<String, CGenError> {
    self.push_scope();
    self.writeln("{");
    self.indent += 1;

    let mut last_value = String::from("NULL");
    for stmt in statements {
        last_value = self.emit_statement(stmt)?;
    }

    // Cleanup before leaving block
    self.emit_scope_cleanup();

    self.indent -= 1;
    self.writeln("}");
    self.pop_scope();

    Ok(last_value)
}

Testing Strategy

Unit Tests

  1. Simple allocation and free:
fn test(): Unit = {
    let x = [1, 2, 3]  // Should be freed at end
}
  1. Nested scopes:
fn test(): Unit = {
    let outer = [1]
    {
        let inner = [2]  // Freed here
    }
    // outer still live
}  // outer freed here
  1. Early return:
fn test(b: Bool): List<Int> = {
    let x = [1, 2, 3]
    if b then return []  // x must be freed before return
    x
}
  1. Conditionals:
fn test(b: Bool): List<Int> = {
    let x = if b then [1] else [2]  // Only one allocated
    x
}

Memory Leak Detection

Use valgrind (if available) or add debug tracking:

static int64_t lux_alloc_count = 0;
static int64_t lux_free_count = 0;

static void* lux_rc_alloc(size_t size, int32_t tag) {
    lux_alloc_count++;
    // ... existing code ...
}

static void lux_drop(void* ptr, int32_t tag) {
    lux_free_count++;
    // ... existing code ...
}

// At program exit:
void lux_check_leaks() {
    if (lux_alloc_count != lux_free_count) {
        fprintf(stderr, "LEAK: %lld allocations, %lld frees\n",
                lux_alloc_count, lux_free_count);
    }
}

Comparison with Perceus

Feature Perceus (Koka) Lux RC (Current)
RC header Yes Yes
Scope tracking Yes Yes
Auto decref Yes Yes
Memory tracking No Yes (debug)
Early return Yes Partial
Last-use opt Yes No
Reuse (FBIP) Yes No
Drop fusion Yes No

Files to Modify

File Changes
src/codegen/c_backend.rs Add scope tracking, emit decrefs

Estimated Complexity

  • Scope tracking data structures: ~30 lines
  • Type classification: ~40 lines
  • Scope cleanup emission: ~30 lines
  • Let binding registration: ~20 lines
  • Early return handling: ~40 lines
  • Block scope handling: ~30 lines
  • Testing: ~100 lines

Total: ~300 lines of careful implementation


Path to Koka/Rust Parity

What We Have Now (Basic RC)

Our current implementation provides:

  • Deterministic cleanup - Memory freed at predictable points (scope exit)
  • No GC pauses - Unlike Go/Java, latency is predictable
  • Leak detection - Debug mode catches memory leaks during development
  • No manual management - Unlike C/Zig, programmer doesn't call free()

What Koka Has (Perceus RC)

Koka's Perceus system adds several optimizations we don't have:

Feature Description Benefit Complexity
Last-use analysis Detect when a variable's final use allows ownership transfer Avoid unnecessary copies Medium
Reuse (FBIP) When rc=1, mutate in-place instead of copy Major performance boost High
Drop specialization Generate type-specific drop instead of polymorphic Fewer branches, faster Low
Drop fusion Combine multiple consecutive drops Fewer function calls Medium
Borrow inference Avoid incref when borrowing temporaries Reduce RC overhead High

What Rust Has (Ownership)

Rust's ownership system is fundamentally different:

Aspect Rust Lux RC Tradeoff
When checked Compile-time Runtime Rust catches bugs earlier
Runtime cost Zero RC operations Rust is faster
Learning curve Steep (borrow checker) Gentle Lux is easier to learn
Expressiveness Limited by lifetimes Unrestricted Lux is more flexible
Cycles Prevented by design Would leak Rust handles more patterns

Key insight: We can never match Rust's zero-overhead guarantees because ownership is checked at compile time. RC always has runtime cost. But we can be as good as Koka.

Remaining Work for Full Memory Safety

Phase A: Complete Coverage (Prevent All Leaks)

  1. Closure RC DONE - Environments are now RC-managed

    • Closures allocated with lux_rc_alloc(sizeof(LuxClosure), LUX_TAG_CLOSURE)
    • Environments allocated with lux_rc_alloc(sizeof(LuxEnv_N), LUX_TAG_ENV)
    • Inline lambdas freed after use in List operations
  2. ADT RC DONE - Algebraic data types with heap fields

    • Track which variants contain RC fields
    • Generate drop functions for each ADT
    • ~100 lines
  3. Early return handling DONE - Cleanup all scopes on return

    • Variables being returned are skipped during scope cleanup
    • Function calls returning RC types are tracked for cleanup
    • Blocks properly handle returning RC variables
  4. Complex conditionals - If/else creating RC values

    • Switch from ternary to if-statements
    • Track RC creation in branches
    • ~50 lines

Phase B: Performance Optimizations (Match Koka)

  1. Last-use optimization

    • Track variable liveness
    • Skip incref on last use (transfer ownership)
    • Requires dataflow analysis
    • ~200 lines
  2. Reuse analysis (FBIP)

    • Detect rc=1 at update sites
    • Mutate in-place instead of copy
    • Major change to list operations
    • ~300 lines
  3. Drop specialization

    • Generate per-type drop functions
    • Eliminate polymorphic dispatch
    • ~100 lines

Estimated Effort

Phase Description Lines Priority Status
A1 Closure RC ~50 P0 Done
A2 ADT RC ~150 P1 Done
A3 Early returns ~30 P1 Done
A4 Conditionals ~50 P2 - Uncommon Pending
B1 Last-use opt ~200 P3 - Performance Pending
B2 Reuse (FBIP) ~300 P3 - Performance Pending
B3 Drop special ~100 P3 - Performance Pending

Phase A remaining: ~50 lines - Gets us to "no leaks" Phase B total: ~600 lines - Gets us to Koka-level performance

Cycle Detection

RC cannot handle cycles (A → B → A). Options:

  1. Ignore - Cycles are rare in functional code (our current approach)
  2. Weak references - Programmer marks back-edges
  3. Cycle collector - Periodic scan for cycles (adds GC-like pauses)

Koka also ignores cycles, relying on functional programming's natural acyclicity.


References