Files
lux/docs/PERFORMANCE_AND_TRADEOFFS.md
Brandon Lucas 15a820a467 fix: make all example programs work correctly
- Add string concatenation support to + operator in typechecker
- Register ADT constructors in both type environment and interpreter
- Bind handlers as values so they can be referenced in run...with
- Fix effect checking to use subset instead of exact match
- Add built-in effects (Console, Fail, State) to run block contexts
- Suppress dead code warnings in diagnostics, modules, parser

Update all example programs with:
- Expected output documented in comments
- Proper run...with statements to execute code

Add new example programs:
- behavioral.lux: pure, idempotent, deterministic, commutative functions
- pipelines.lux: pipe operator demonstrations
- statemachine.lux: ADT-based state machines
- tailcall.lux: tail call optimization examples
- traits.lux: type classes and pattern matching

Add documentation:
- docs/IMPLEMENTATION_PLAN.md: feature roadmap and status
- docs/PERFORMANCE_AND_TRADEOFFS.md: performance analysis

Add benchmarks for performance testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 09:05:06 -05:00

12 KiB

Lux Performance Characteristics and Language Tradeoffs

Executive Summary

Lux is a tree-walking interpreted language with algebraic effects. This document analyzes its performance characteristics, compares it to other languages, and explains the design tradeoffs made.

Key Performance Characteristics:

  • Interpretation overhead: ~100-1000x slower than native compiled languages
  • Tail call optimization: Effective, prevents stack overflow
  • Effect handling: ~10-20% overhead per effect operation
  • Memory: Reference counting for closures, aggressive cloning for collections

Benchmark Results

Test System

Benchmarks run via tree-walking interpreter in release mode.

Results Summary

Benchmark Time Operations Ops/sec Notes
Fibonacci (naive, n=30) 34,980ms ~1.3M calls 37K Exponential recursion
Fibonacci (TCO, n=100K) 498ms 100K iterations 200K Tail-call optimized
List operations (10K) 461ms 30K ops 65K map+filter+fold
Pattern matching (32K nodes) 964ms 65K matches 67K Tree traversal
Closures (100K calls) 538ms 100K closures 186K Closure creation + calls
String ops (1K concat) 457ms 1K concats 2.2K String building

Analysis

Naive Recursion is Expensive:

  • fib(30) takes 35 seconds due to exponential call overhead
  • Each function call involves: environment extension, parameter binding, AST traversal
  • Compare: Python ~2s, JavaScript ~0.05s, Rust ~0.001s

TCO is Effective:

  • fib(100,000) completes in 500ms without stack overflow
  • Linear time, constant stack space
  • The trampoline approach works well

Collection Operations Have Cloning Overhead:

  • List.map/filter/fold clone the entire list to extract from Value enum
  • Pre-allocation in List.map helps but cloning dominates
  • Larger lists will show worse performance

Implementation Details

Evaluation Strategy: Tree-Walking Interpreter

Source Code → Lexer → Tokens → Parser → AST → Interpreter → Value

Pros:

  • Simple to implement and debug
  • Direct correspondence between AST and execution
  • Easy to add new features

Cons:

  • No optimization passes
  • Repeated AST traversal
  • No instruction caching
  • ~100-1000x slower than bytecode/native

Comparison:

Language Strategy Relative Speed
Lux Tree-walking 1x (baseline)
Python Bytecode VM 10-50x faster
JavaScript (V8) JIT compiled 100-500x faster
Haskell (GHC) Native compiled 500-2000x faster
Rust Native compiled 1000-5000x faster

Value Representation

pub enum Value {
    Int(i64),                    // Unboxed, 8 bytes
    Float(f64),                  // Unboxed, 8 bytes
    Bool(bool),                  // Unboxed, 1 byte
    String(String),              // Heap-allocated, ~24 bytes + data
    List(Vec<Value>),            // Heap-allocated, ~24 bytes + n*size(Value)
    Function(Rc<Closure>),       // Reference-counted, 8 bytes pointer
    Constructor { ... },         // Tagged union
    ...
}

Memory Overhead:

  • Each Value is ~40-80 bytes due to enum discriminant + largest variant
  • Lists are Vec<Value>, so each element is a full Value enum
  • No small-value optimization

Tradeoffs:

Aspect Lux Approach Alternative Tradeoff
Primitives Unboxed in enum NaN-boxing Simpler code, more memory
Strings Owned String Interned/Rc Simpler, more copying
Lists Vec Rc<Vec<Rc>> Simpler, expensive clone
Closures Rc Owned Cheap sharing, GC needed

Closure Capture

pub struct Closure {
    params: Vec<String>,
    body: Expr,
    env: Env,  // Entire lexical environment
}

pub struct Env {
    bindings: Rc<RefCell<HashMap<String, Value>>>,
    parent: Option<Box<Env>>,
}

Characteristics:

  • Closures capture the entire environment chain (lexical scoping)
  • Environment lookup is O(depth) - traverses parent chain
  • Variable access clones the value (expensive for large values)

Comparison:

Language Capture Strategy Lookup Cost
Lux Scope chain O(depth)
JavaScript Scope chain O(depth), optimized
Python Cell references O(1) after first access
Rust Move/borrow O(1), compile-time resolved

Effect Handling

fn handle_effect(&mut self, request: EffectRequest) -> Result<Value, RuntimeError> {
    // Linear search through handler stack (LIFO)
    for handler in self.handler_stack.iter().rev() {
        if handler.effect == request.effect {
            // Clone handler environment and execute
            ...
        }
    }
}

Overhead per Effect Operation:

  1. Create EffectRequest struct
  2. Linear search through handler stack (typically O(1-5))
  3. Clone handler environment
  4. Execute handler body
  5. Return value

Comparison with Other Approaches:

Approach Overhead Flexibility
Lux (runtime handlers) ~10-20% High - dynamic dispatch
Koka (evidence passing) ~1-5% High - optimized
Haskell mtl (transformers) ~5-10% Medium - static
Rust (traits) 0% Low - compile-time only

Tail Call Optimization

pub enum EvalResult {
    Value(Value),
    Effect(EffectRequest),
    TailCall { func, args, span },  // Trampoline marker
}

// Trampoline loop
loop {
    match result {
        EvalResult::Value(v) => return Ok(v),
        EvalResult::TailCall { func, args, span } => {
            result = self.eval_call(func, args, span)?;
        }
    }
}

Characteristics:

  • Explicit tail position tracking via tail: bool parameter
  • TailCall variant prevents stack growth
  • Only function calls in tail position are optimized
  • Arguments are always evaluated eagerly before tail call

Comparison:

Language TCO Support Mechanism
Lux Full Trampoline
Scheme Full Required by spec
Haskell Full Lazy evaluation + STG
JavaScript Safari only Implementation-dependent
Python None Explicit recursion limit
Rust Limited LLVM optimization

Language Tradeoffs

1. Safety vs Performance

Choice: Safety First

Decision Safety Benefit Performance Cost
Immutable values No data races Clone on every modification
Explicit effects No hidden side effects Handler lookup overhead
Type checking Catch errors early Compile-time overhead
Exhaustive matching No missed cases Runtime pattern matching

2. Simplicity vs Optimization

Choice: Simplicity First

Decision Simplicity Benefit Lost Optimization
Tree-walking Easy to implement No bytecode caching
Value enum Uniform handling No NaN-boxing
Clone semantics Predictable memory No move optimization
No mutation No aliasing issues Can't update in place

3. Expressiveness vs Compilation

Choice: Expressiveness First

Feature Expressiveness Benefit Compilation Challenge
Algebraic effects Composable side effects Hard to optimize
First-class handlers Runtime flexibility Dynamic dispatch
Effect polymorphism (planned) Generic effect code Complex inference
Refinement types (planned) Precise specifications SMT solver needed

4. Comparison Matrix

Aspect Lux Koka Haskell Rust TypeScript
Execution Interpreted Compiled Compiled Compiled JIT
Effects Algebraic Algebraic Monads Traits Promises
Memory RC + Clone RC + Reuse GC Ownership GC
Mutability Immutable Immutable Immutable Controlled Mutable
TCO Trampoline Native Native LLVM No
Typing HM Inference HM + Effects HM + Extensions Explicit Structural

How to Measure Performance

Running Benchmarks

# Run a specific benchmark
nix develop --command cargo run --release -- benchmarks/fibonacci.lux

# Time a benchmark
time nix develop --command cargo run --release -- benchmarks/fibonacci_tco.lux

# Run with effect tracing (slower but shows effect operations)
# In REPL: :trace on

Benchmark Suite

File Tests Expected Time
fibonacci.lux Function call overhead ~35s (fib 30)
fibonacci_tco.lux Tail call optimization ~0.5s (fib 100K)
list_operations.lux Collection performance ~0.5s (10K elements)
pattern_matching.lux ADT matching ~1s (32K nodes)
effects.lux Effect dispatch ~0.4s (10K effects)
closures.lux Closure performance ~0.5s (100K closures)
strings.lux String operations ~0.5s (1K concats)

Key Metrics to Measure

  1. Function calls per second: Use recursive fibonacci
  2. Effect operations per second: Use counter effect benchmark
  3. Pattern matches per second: Use tree traversal
  4. Closure creations per second: Use makeAdder benchmark
  5. List operations per second: Use map/filter/fold chain
  6. Memory usage: Monitor with system tools (not built-in yet)

Comparison Benchmarks

To compare with other languages, implement the same algorithms:

Fibonacci (n=30) comparison:

Lux (interpreted):     ~35,000 ms
Python 3:              ~2,000 ms
Node.js:               ~50 ms
Haskell (ghci):        ~200 ms
Haskell (compiled):    ~5 ms
Rust:                  ~1 ms

Performance Improvement Opportunities

Short-term (Interpreter Improvements)

  1. Bytecode compilation: Convert AST to bytecode for faster dispatch
  2. Value representation: Use NaN-boxing for primitives
  3. Environment optimization: Use flat closure representation
  4. List operations: Avoid cloning by using Rc<Vec<Rc>>
  5. String interning: Deduplicate string values

Medium-term (New Backend)

  1. WASM compilation: Target WebAssembly for portable native speed
  2. JavaScript emission: Leverage V8/SpiderMonkey JIT
  3. LLVM backend: Generate native code via LLVM IR

Long-term (Advanced Optimizations)

  1. Effect fusion: Combine adjacent effect operations
  2. Inlining: Inline small functions
  3. Specialization: Generate specialized code for monomorphic calls
  4. Escape analysis: Stack-allocate non-escaping values

Estimated Speedup Potential

Optimization Expected Speedup Effort
Bytecode VM 5-10x Medium
NaN-boxing 1.5-2x Low
Flat closures 2-3x Medium
WASM backend 50-100x High
LLVM backend 100-500x Very High

Conclusion

Lux prioritizes expressiveness, safety, and simplicity over raw performance. The current interpreter is suitable for:

  • Prototyping and development
  • Educational purposes
  • Small scripts and tools
  • Testing effect-based designs

For production workloads requiring high performance, a compilation backend would be necessary. The language design is amenable to efficient compilation - algebraic effects can be compiled using CPS transformation or evidence passing, and the pure functional core can benefit from standard optimizations.

The key insight is that Lux's performance ceiling is set by implementation choices (interpreter vs compiler), not fundamental language limitations. Languages like Koka demonstrate that algebraic effects can be compiled efficiently.