blu/lux

Files

Brandon Lucas 15a820a467 fix: make all example programs work correctly

- Add string concatenation support to + operator in typechecker
- Register ADT constructors in both type environment and interpreter
- Bind handlers as values so they can be referenced in run...with
- Fix effect checking to use subset instead of exact match
- Add built-in effects (Console, Fail, State) to run block contexts
- Suppress dead code warnings in diagnostics, modules, parser

Update all example programs with:
- Expected output documented in comments
- Proper run...with statements to execute code

Add new example programs:
- behavioral.lux: pure, idempotent, deterministic, commutative functions
- pipelines.lux: pipe operator demonstrations
- statemachine.lux: ADT-based state machines
- tailcall.lux: tail call optimization examples
- traits.lux: type classes and pattern matching

Add documentation:
- docs/IMPLEMENTATION_PLAN.md: feature roadmap and status
- docs/PERFORMANCE_AND_TRADEOFFS.md: performance analysis

Add benchmarks for performance testing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-13 09:05:06 -05:00

12 KiB

Raw Blame History

Lux Performance Characteristics and Language Tradeoffs

Executive Summary

Lux is a tree-walking interpreted language with algebraic effects. This document analyzes its performance characteristics, compares it to other languages, and explains the design tradeoffs made.

Key Performance Characteristics:

Interpretation overhead: ~100-1000x slower than native compiled languages
Tail call optimization: Effective, prevents stack overflow
Effect handling: ~10-20% overhead per effect operation
Memory: Reference counting for closures, aggressive cloning for collections

Benchmark Results

Test System

Benchmarks run via tree-walking interpreter in release mode.

Results Summary

Benchmark	Time	Operations	Ops/sec	Notes
Fibonacci (naive, n=30)	34,980ms	~1.3M calls	37K	Exponential recursion
Fibonacci (TCO, n=100K)	498ms	100K iterations	200K	Tail-call optimized
List operations (10K)	461ms	30K ops	65K	map+filter+fold
Pattern matching (32K nodes)	964ms	65K matches	67K	Tree traversal
Closures (100K calls)	538ms	100K closures	186K	Closure creation + calls
String ops (1K concat)	457ms	1K concats	2.2K	String building

Analysis

Naive Recursion is Expensive:

fib(30) takes 35 seconds due to exponential call overhead
Each function call involves: environment extension, parameter binding, AST traversal
Compare: Python ~2s, JavaScript ~0.05s, Rust ~0.001s

TCO is Effective:

fib(100,000) completes in 500ms without stack overflow
Linear time, constant stack space
The trampoline approach works well

Collection Operations Have Cloning Overhead:

List.map/filter/fold clone the entire list to extract from Value enum
Pre-allocation in List.map helps but cloning dominates
Larger lists will show worse performance

Implementation Details

Evaluation Strategy: Tree-Walking Interpreter

Source Code → Lexer → Tokens → Parser → AST → Interpreter → Value

Pros:

Simple to implement and debug
Direct correspondence between AST and execution
Easy to add new features

Cons:

No optimization passes
Repeated AST traversal
No instruction caching
~100-1000x slower than bytecode/native

Comparison:

Language	Strategy	Relative Speed
Lux	Tree-walking	1x (baseline)
Python	Bytecode VM	10-50x faster
JavaScript (V8)	JIT compiled	100-500x faster
Haskell (GHC)	Native compiled	500-2000x faster
Rust	Native compiled	1000-5000x faster

Value Representation

pub enum Value {
    Int(i64),                    // Unboxed, 8 bytes
    Float(f64),                  // Unboxed, 8 bytes
    Bool(bool),                  // Unboxed, 1 byte
    String(String),              // Heap-allocated, ~24 bytes + data
    List(Vec<Value>),            // Heap-allocated, ~24 bytes + n*size(Value)
    Function(Rc<Closure>),       // Reference-counted, 8 bytes pointer
    Constructor { ... },         // Tagged union
    ...
}

Memory Overhead:

Each Value is ~40-80 bytes due to enum discriminant + largest variant
Lists are Vec<Value>, so each element is a full Value enum
No small-value optimization

Tradeoffs:

Aspect	Lux Approach	Alternative	Tradeoff
Primitives	Unboxed in enum	NaN-boxing	Simpler code, more memory
Strings	Owned String	Interned/Rc	Simpler, more copying
Lists	Vec	Rc<Vec<Rc>>	Simpler, expensive clone
Closures	Rc	Owned	Cheap sharing, GC needed

Closure Capture

pub struct Closure {
    params: Vec<String>,
    body: Expr,
    env: Env,  // Entire lexical environment
}

pub struct Env {
    bindings: Rc<RefCell<HashMap<String, Value>>>,
    parent: Option<Box<Env>>,
}

Characteristics:

Closures capture the entire environment chain (lexical scoping)
Environment lookup is O(depth) - traverses parent chain
Variable access clones the value (expensive for large values)

Comparison:

Language	Capture Strategy	Lookup Cost
Lux	Scope chain	O(depth)
JavaScript	Scope chain	O(depth), optimized
Python	Cell references	O(1) after first access
Rust	Move/borrow	O(1), compile-time resolved

Effect Handling

fn handle_effect(&mut self, request: EffectRequest) -> Result<Value, RuntimeError> {
    // Linear search through handler stack (LIFO)
    for handler in self.handler_stack.iter().rev() {
        if handler.effect == request.effect {
            // Clone handler environment and execute
            ...
        }
    }
}

Overhead per Effect Operation:

Create EffectRequest struct
Linear search through handler stack (typically O(1-5))
Clone handler environment
Execute handler body
Return value

Comparison with Other Approaches:

Approach	Overhead	Flexibility
Lux (runtime handlers)	~10-20%	High - dynamic dispatch
Koka (evidence passing)	~1-5%	High - optimized
Haskell mtl (transformers)	~5-10%	Medium - static
Rust (traits)	0%	Low - compile-time only

Tail Call Optimization

pub enum EvalResult {
    Value(Value),
    Effect(EffectRequest),
    TailCall { func, args, span },  // Trampoline marker
}

// Trampoline loop
loop {
    match result {
        EvalResult::Value(v) => return Ok(v),
        EvalResult::TailCall { func, args, span } => {
            result = self.eval_call(func, args, span)?;
        }
    }
}

Characteristics:

Explicit tail position tracking via tail: bool parameter
TailCall variant prevents stack growth
Only function calls in tail position are optimized
Arguments are always evaluated eagerly before tail call

Comparison:

Language	TCO Support	Mechanism
Lux	Full	Trampoline
Scheme	Full	Required by spec
Haskell	Full	Lazy evaluation + STG
JavaScript	Safari only	Implementation-dependent
Python	None	Explicit recursion limit
Rust	Limited	LLVM optimization

Language Tradeoffs

1. Safety vs Performance

Choice: Safety First

Decision	Safety Benefit	Performance Cost
Immutable values	No data races	Clone on every modification
Explicit effects	No hidden side effects	Handler lookup overhead
Type checking	Catch errors early	Compile-time overhead
Exhaustive matching	No missed cases	Runtime pattern matching

2. Simplicity vs Optimization

Choice: Simplicity First

Decision	Simplicity Benefit	Lost Optimization
Tree-walking	Easy to implement	No bytecode caching
Value enum	Uniform handling	No NaN-boxing
Clone semantics	Predictable memory	No move optimization
No mutation	No aliasing issues	Can't update in place

3. Expressiveness vs Compilation

Choice: Expressiveness First

Feature	Expressiveness Benefit	Compilation Challenge
Algebraic effects	Composable side effects	Hard to optimize
First-class handlers	Runtime flexibility	Dynamic dispatch
Effect polymorphism (planned)	Generic effect code	Complex inference
Refinement types (planned)	Precise specifications	SMT solver needed

4. Comparison Matrix

Aspect	Lux	Koka	Haskell	Rust	TypeScript
Execution	Interpreted	Compiled	Compiled	Compiled	JIT
Effects	Algebraic	Algebraic	Monads	Traits	Promises
Memory	RC + Clone	RC + Reuse	GC	Ownership	GC
Mutability	Immutable	Immutable	Immutable	Controlled	Mutable
TCO	Trampoline	Native	Native	LLVM	No
Typing	HM Inference	HM + Effects	HM + Extensions	Explicit	Structural

How to Measure Performance

Running Benchmarks

# Run a specific benchmark
nix develop --command cargo run --release -- benchmarks/fibonacci.lux

# Time a benchmark
time nix develop --command cargo run --release -- benchmarks/fibonacci_tco.lux

# Run with effect tracing (slower but shows effect operations)
# In REPL: :trace on

Benchmark Suite

File	Tests	Expected Time
`fibonacci.lux`	Function call overhead	~35s (fib 30)
`fibonacci_tco.lux`	Tail call optimization	~0.5s (fib 100K)
`list_operations.lux`	Collection performance	~0.5s (10K elements)
`pattern_matching.lux`	ADT matching	~1s (32K nodes)
`effects.lux`	Effect dispatch	~0.4s (10K effects)
`closures.lux`	Closure performance	~0.5s (100K closures)
`strings.lux`	String operations	~0.5s (1K concats)

Key Metrics to Measure

Function calls per second: Use recursive fibonacci
Effect operations per second: Use counter effect benchmark
Pattern matches per second: Use tree traversal
Closure creations per second: Use makeAdder benchmark
List operations per second: Use map/filter/fold chain
Memory usage: Monitor with system tools (not built-in yet)

Comparison Benchmarks

To compare with other languages, implement the same algorithms:

Fibonacci (n=30) comparison:

Lux (interpreted):     ~35,000 ms
Python 3:              ~2,000 ms
Node.js:               ~50 ms
Haskell (ghci):        ~200 ms
Haskell (compiled):    ~5 ms
Rust:                  ~1 ms

Performance Improvement Opportunities

Short-term (Interpreter Improvements)

Bytecode compilation: Convert AST to bytecode for faster dispatch
Value representation: Use NaN-boxing for primitives
Environment optimization: Use flat closure representation
List operations: Avoid cloning by using Rc<Vec<Rc>>
String interning: Deduplicate string values

Medium-term (New Backend)

WASM compilation: Target WebAssembly for portable native speed
JavaScript emission: Leverage V8/SpiderMonkey JIT
LLVM backend: Generate native code via LLVM IR

Long-term (Advanced Optimizations)

Effect fusion: Combine adjacent effect operations
Inlining: Inline small functions
Specialization: Generate specialized code for monomorphic calls
Escape analysis: Stack-allocate non-escaping values

Estimated Speedup Potential

Optimization	Expected Speedup	Effort
Bytecode VM	5-10x	Medium
NaN-boxing	1.5-2x	Low
Flat closures	2-3x	Medium
WASM backend	50-100x	High
LLVM backend	100-500x	Very High

Conclusion

Lux prioritizes expressiveness, safety, and simplicity over raw performance. The current interpreter is suitable for:

Prototyping and development
Educational purposes
Small scripts and tools
Testing effect-based designs

For production workloads requiring high performance, a compilation backend would be necessary. The language design is amenable to efficient compilation - algebraic effects can be compiled using CPS transformation or evidence passing, and the pure functional core can benefit from standard optimizations.

The key insight is that Lux's performance ceiling is set by implementation choices (interpreter vs compiler), not fundamental language limitations. Languages like Koka demonstrate that algebraic effects can be compiled efficiently.

12 KiB Raw Blame History

Lux Performance Characteristics and Language Tradeoffs

Executive Summary

Benchmark Results

Test System

Results Summary

Analysis

Implementation Details

Evaluation Strategy: Tree-Walking Interpreter

Value Representation

Closure Capture

Effect Handling

Tail Call Optimization

Language Tradeoffs

1. Safety vs Performance

2. Simplicity vs Optimization

3. Expressiveness vs Compilation

4. Comparison Matrix

How to Measure Performance

Running Benchmarks

Benchmark Suite

Key Metrics to Measure

Comparison Benchmarks

Performance Improvement Opportunities

Short-term (Interpreter Improvements)

Medium-term (New Backend)

Long-term (Advanced Optimizations)

Estimated Speedup Potential

Conclusion

12 KiB

Raw Blame History