- Add string concatenation support to + operator in typechecker - Register ADT constructors in both type environment and interpreter - Bind handlers as values so they can be referenced in run...with - Fix effect checking to use subset instead of exact match - Add built-in effects (Console, Fail, State) to run block contexts - Suppress dead code warnings in diagnostics, modules, parser Update all example programs with: - Expected output documented in comments - Proper run...with statements to execute code Add new example programs: - behavioral.lux: pure, idempotent, deterministic, commutative functions - pipelines.lux: pipe operator demonstrations - statemachine.lux: ADT-based state machines - tailcall.lux: tail call optimization examples - traits.lux: type classes and pattern matching Add documentation: - docs/IMPLEMENTATION_PLAN.md: feature roadmap and status - docs/PERFORMANCE_AND_TRADEOFFS.md: performance analysis Add benchmarks for performance testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
12 KiB
Lux Performance Characteristics and Language Tradeoffs
Executive Summary
Lux is a tree-walking interpreted language with algebraic effects. This document analyzes its performance characteristics, compares it to other languages, and explains the design tradeoffs made.
Key Performance Characteristics:
- Interpretation overhead: ~100-1000x slower than native compiled languages
- Tail call optimization: Effective, prevents stack overflow
- Effect handling: ~10-20% overhead per effect operation
- Memory: Reference counting for closures, aggressive cloning for collections
Benchmark Results
Test System
Benchmarks run via tree-walking interpreter in release mode.
Results Summary
| Benchmark | Time | Operations | Ops/sec | Notes |
|---|---|---|---|---|
| Fibonacci (naive, n=30) | 34,980ms | ~1.3M calls | 37K | Exponential recursion |
| Fibonacci (TCO, n=100K) | 498ms | 100K iterations | 200K | Tail-call optimized |
| List operations (10K) | 461ms | 30K ops | 65K | map+filter+fold |
| Pattern matching (32K nodes) | 964ms | 65K matches | 67K | Tree traversal |
| Closures (100K calls) | 538ms | 100K closures | 186K | Closure creation + calls |
| String ops (1K concat) | 457ms | 1K concats | 2.2K | String building |
Analysis
Naive Recursion is Expensive:
- fib(30) takes 35 seconds due to exponential call overhead
- Each function call involves: environment extension, parameter binding, AST traversal
- Compare: Python ~2s, JavaScript ~0.05s, Rust ~0.001s
TCO is Effective:
- fib(100,000) completes in 500ms without stack overflow
- Linear time, constant stack space
- The trampoline approach works well
Collection Operations Have Cloning Overhead:
- List.map/filter/fold clone the entire list to extract from Value enum
- Pre-allocation in List.map helps but cloning dominates
- Larger lists will show worse performance
Implementation Details
Evaluation Strategy: Tree-Walking Interpreter
Source Code → Lexer → Tokens → Parser → AST → Interpreter → Value
Pros:
- Simple to implement and debug
- Direct correspondence between AST and execution
- Easy to add new features
Cons:
- No optimization passes
- Repeated AST traversal
- No instruction caching
- ~100-1000x slower than bytecode/native
Comparison:
| Language | Strategy | Relative Speed |
|---|---|---|
| Lux | Tree-walking | 1x (baseline) |
| Python | Bytecode VM | 10-50x faster |
| JavaScript (V8) | JIT compiled | 100-500x faster |
| Haskell (GHC) | Native compiled | 500-2000x faster |
| Rust | Native compiled | 1000-5000x faster |
Value Representation
pub enum Value {
Int(i64), // Unboxed, 8 bytes
Float(f64), // Unboxed, 8 bytes
Bool(bool), // Unboxed, 1 byte
String(String), // Heap-allocated, ~24 bytes + data
List(Vec<Value>), // Heap-allocated, ~24 bytes + n*size(Value)
Function(Rc<Closure>), // Reference-counted, 8 bytes pointer
Constructor { ... }, // Tagged union
...
}
Memory Overhead:
- Each
Valueis ~40-80 bytes due to enum discriminant + largest variant - Lists are
Vec<Value>, so each element is a fullValueenum - No small-value optimization
Tradeoffs:
| Aspect | Lux Approach | Alternative | Tradeoff |
|---|---|---|---|
| Primitives | Unboxed in enum | NaN-boxing | Simpler code, more memory |
| Strings | Owned String | Interned/Rc | Simpler, more copying |
| Lists | Vec | Rc<Vec<Rc>> | Simpler, expensive clone |
| Closures | Rc | Owned | Cheap sharing, GC needed |
Closure Capture
pub struct Closure {
params: Vec<String>,
body: Expr,
env: Env, // Entire lexical environment
}
pub struct Env {
bindings: Rc<RefCell<HashMap<String, Value>>>,
parent: Option<Box<Env>>,
}
Characteristics:
- Closures capture the entire environment chain (lexical scoping)
- Environment lookup is O(depth) - traverses parent chain
- Variable access clones the value (expensive for large values)
Comparison:
| Language | Capture Strategy | Lookup Cost |
|---|---|---|
| Lux | Scope chain | O(depth) |
| JavaScript | Scope chain | O(depth), optimized |
| Python | Cell references | O(1) after first access |
| Rust | Move/borrow | O(1), compile-time resolved |
Effect Handling
fn handle_effect(&mut self, request: EffectRequest) -> Result<Value, RuntimeError> {
// Linear search through handler stack (LIFO)
for handler in self.handler_stack.iter().rev() {
if handler.effect == request.effect {
// Clone handler environment and execute
...
}
}
}
Overhead per Effect Operation:
- Create
EffectRequeststruct - Linear search through handler stack (typically O(1-5))
- Clone handler environment
- Execute handler body
- Return value
Comparison with Other Approaches:
| Approach | Overhead | Flexibility |
|---|---|---|
| Lux (runtime handlers) | ~10-20% | High - dynamic dispatch |
| Koka (evidence passing) | ~1-5% | High - optimized |
| Haskell mtl (transformers) | ~5-10% | Medium - static |
| Rust (traits) | 0% | Low - compile-time only |
Tail Call Optimization
pub enum EvalResult {
Value(Value),
Effect(EffectRequest),
TailCall { func, args, span }, // Trampoline marker
}
// Trampoline loop
loop {
match result {
EvalResult::Value(v) => return Ok(v),
EvalResult::TailCall { func, args, span } => {
result = self.eval_call(func, args, span)?;
}
}
}
Characteristics:
- Explicit tail position tracking via
tail: boolparameter - TailCall variant prevents stack growth
- Only function calls in tail position are optimized
- Arguments are always evaluated eagerly before tail call
Comparison:
| Language | TCO Support | Mechanism |
|---|---|---|
| Lux | Full | Trampoline |
| Scheme | Full | Required by spec |
| Haskell | Full | Lazy evaluation + STG |
| JavaScript | Safari only | Implementation-dependent |
| Python | None | Explicit recursion limit |
| Rust | Limited | LLVM optimization |
Language Tradeoffs
1. Safety vs Performance
Choice: Safety First
| Decision | Safety Benefit | Performance Cost |
|---|---|---|
| Immutable values | No data races | Clone on every modification |
| Explicit effects | No hidden side effects | Handler lookup overhead |
| Type checking | Catch errors early | Compile-time overhead |
| Exhaustive matching | No missed cases | Runtime pattern matching |
2. Simplicity vs Optimization
Choice: Simplicity First
| Decision | Simplicity Benefit | Lost Optimization |
|---|---|---|
| Tree-walking | Easy to implement | No bytecode caching |
| Value enum | Uniform handling | No NaN-boxing |
| Clone semantics | Predictable memory | No move optimization |
| No mutation | No aliasing issues | Can't update in place |
3. Expressiveness vs Compilation
Choice: Expressiveness First
| Feature | Expressiveness Benefit | Compilation Challenge |
|---|---|---|
| Algebraic effects | Composable side effects | Hard to optimize |
| First-class handlers | Runtime flexibility | Dynamic dispatch |
| Effect polymorphism (planned) | Generic effect code | Complex inference |
| Refinement types (planned) | Precise specifications | SMT solver needed |
4. Comparison Matrix
| Aspect | Lux | Koka | Haskell | Rust | TypeScript |
|---|---|---|---|---|---|
| Execution | Interpreted | Compiled | Compiled | Compiled | JIT |
| Effects | Algebraic | Algebraic | Monads | Traits | Promises |
| Memory | RC + Clone | RC + Reuse | GC | Ownership | GC |
| Mutability | Immutable | Immutable | Immutable | Controlled | Mutable |
| TCO | Trampoline | Native | Native | LLVM | No |
| Typing | HM Inference | HM + Effects | HM + Extensions | Explicit | Structural |
How to Measure Performance
Running Benchmarks
# Run a specific benchmark
nix develop --command cargo run --release -- benchmarks/fibonacci.lux
# Time a benchmark
time nix develop --command cargo run --release -- benchmarks/fibonacci_tco.lux
# Run with effect tracing (slower but shows effect operations)
# In REPL: :trace on
Benchmark Suite
| File | Tests | Expected Time |
|---|---|---|
fibonacci.lux |
Function call overhead | ~35s (fib 30) |
fibonacci_tco.lux |
Tail call optimization | ~0.5s (fib 100K) |
list_operations.lux |
Collection performance | ~0.5s (10K elements) |
pattern_matching.lux |
ADT matching | ~1s (32K nodes) |
effects.lux |
Effect dispatch | ~0.4s (10K effects) |
closures.lux |
Closure performance | ~0.5s (100K closures) |
strings.lux |
String operations | ~0.5s (1K concats) |
Key Metrics to Measure
- Function calls per second: Use recursive fibonacci
- Effect operations per second: Use counter effect benchmark
- Pattern matches per second: Use tree traversal
- Closure creations per second: Use makeAdder benchmark
- List operations per second: Use map/filter/fold chain
- Memory usage: Monitor with system tools (not built-in yet)
Comparison Benchmarks
To compare with other languages, implement the same algorithms:
Fibonacci (n=30) comparison:
Lux (interpreted): ~35,000 ms
Python 3: ~2,000 ms
Node.js: ~50 ms
Haskell (ghci): ~200 ms
Haskell (compiled): ~5 ms
Rust: ~1 ms
Performance Improvement Opportunities
Short-term (Interpreter Improvements)
- Bytecode compilation: Convert AST to bytecode for faster dispatch
- Value representation: Use NaN-boxing for primitives
- Environment optimization: Use flat closure representation
- List operations: Avoid cloning by using Rc<Vec<Rc>>
- String interning: Deduplicate string values
Medium-term (New Backend)
- WASM compilation: Target WebAssembly for portable native speed
- JavaScript emission: Leverage V8/SpiderMonkey JIT
- LLVM backend: Generate native code via LLVM IR
Long-term (Advanced Optimizations)
- Effect fusion: Combine adjacent effect operations
- Inlining: Inline small functions
- Specialization: Generate specialized code for monomorphic calls
- Escape analysis: Stack-allocate non-escaping values
Estimated Speedup Potential
| Optimization | Expected Speedup | Effort |
|---|---|---|
| Bytecode VM | 5-10x | Medium |
| NaN-boxing | 1.5-2x | Low |
| Flat closures | 2-3x | Medium |
| WASM backend | 50-100x | High |
| LLVM backend | 100-500x | Very High |
Conclusion
Lux prioritizes expressiveness, safety, and simplicity over raw performance. The current interpreter is suitable for:
- Prototyping and development
- Educational purposes
- Small scripts and tools
- Testing effect-based designs
For production workloads requiring high performance, a compilation backend would be necessary. The language design is amenable to efficient compilation - algebraic effects can be compiled using CPS transformation or evidence passing, and the pure functional core can benefit from standard optimizations.
The key insight is that Lux's performance ceiling is set by implementation choices (interpreter vs compiler), not fundamental language limitations. Languages like Koka demonstrate that algebraic effects can be compiled efficiently.