# Lux Performance Benchmarks This document provides comprehensive performance measurements comparing Lux to other languages. ## Quick Start ```bash # Run full benchmark suite nix run .#bench # Run quick Lux vs C comparison nix run .#bench-quick # Run detailed CPU metrics with poop nix run .#bench-poop ``` ## Execution Modes Lux supports two execution modes: 1. **Compiled** (`lux compile`): Generates C code, compiles with gcc -O3. Native performance. 2. **Interpreted** (`lux run`): Tree-walking interpreter. Slower but instant startup. ## Benchmark Environment - **Platform**: Linux x86_64 (NixOS) - **Lux**: v0.1.0 (compiled via C backend) - **C**: gcc with -O3 - **Rust**: rustc with -C opt-level=3 -C lto - **Zig**: zig with -O ReleaseFast - **Tools**: hyperfine, poop ## Results Summary ### hyperfine Results ``` Benchmark 1: /tmp/fib_lux Time (mean ± σ): 28.1 ms ± 0.6 ms Benchmark 2: /tmp/fib_c Time (mean ± σ): 29.0 ms ± 2.1 ms Benchmark 3: /tmp/fib_rust Time (mean ± σ): 41.2 ms ± 0.6 ms Benchmark 4: /tmp/fib_zig Time (mean ± σ): 47.0 ms ± 1.1 ms Summary /tmp/fib_lux ran 1.03 ± 0.08 times faster than /tmp/fib_c 1.47 ± 0.04 times faster than /tmp/fib_rust 1.67 ± 0.05 times faster than /tmp/fib_zig ``` | Benchmark | C (gcc -O3) | Rust | Zig | **Lux (compiled)** | Lux (interp) | |-----------|-------------|------|-----|---------------------|--------------| | Fibonacci(35) | 29.0ms | 41.2ms | 47.0ms | **28.1ms** | 254ms | ### poop Results (Detailed CPU Metrics) | Metric | C | Lux | Rust | Zig | |--------|---|-----|------|-----| | **Wall Time** | 29.0ms | 29.2ms (+0.8%) | 42.0ms (+45%) | 48.1ms (+66%) | | **CPU Cycles** | 53.1M | 53.2M (+0.2%) | 78.2M (+47%) | 90.4M (+70%) | | **Instructions** | 293M | 292M (-0.5%) | 302M (+3.2%) | 317M (+8.1%) | | **Cache Refs** | 11.4K | 11.7K (+3.1%) | 17.8K (+57%) | 1.87K (-84%) | | **Cache Misses** | 4.39K | 4.62K (+5.3%) | 6.47K (+47%) | 340 (-92%) | | **Branch Misses** | 28.3K | 32.0K (+13%) | 33.5K (+18%) | 29.6K (+4.7%) | | **Peak RSS** | 1.56MB | 1.63MB (+4.7%) | 2.00MB (+29%) | 1.07MB (-32%) | ### Key Observations 1. **Lux matches C**: Within measurement noise (0.8% difference) 2. **Lux beats Rust by 47%**: Fewer CPU cycles, fewer instructions 3. **Lux beats Zig by 67%**: Despite Zig's excellent cache efficiency 4. **Instruction efficiency**: Lux executes fewer instructions than Rust/Zig ## Why Compiled Lux is Fast ### 1. gcc's Aggressive Recursion Optimization When Lux compiles to C, gcc transforms the recursive Fibonacci into highly optimized loops: **Rust (LLVM) keeps one recursive call:** ```asm a640: lea -0x1(%r14),%rdi a644: call a630 ; <-- recursive call a649: lea -0x2(%r14),%rdi a657: ja a640 ; loop for fib(n-2) ``` **Lux/C (gcc) transforms to pure loops:** ```asm ; No 'call fib' in the hot path ; Uses r12-r15, rbx as accumulators ; Complex but efficient loop structure ``` ### 2. Compiler Optimization Strategies | Compiler | Backend | Strategy | |----------|---------|----------| | **gcc -O3** | Native | Aggressive recursion elimination, loop unrolling | | **LLVM (Rust/Zig)** | Native | Conservative, preserves some recursion | gcc has decades of optimization work specifically for transforming recursive C code into efficient loops. By generating clean C, Lux inherits this optimization automatically. ### 3. Why More Instructions = Slower (Rust/Zig) The poop results show: - **C/Lux**: 293M instructions, 53M cycles - **Rust**: 302M instructions (+3%), 78M cycles (+47%) - **Zig**: 317M instructions (+8%), 90M cycles (+70%) The extra instructions in Rust/Zig come from: - Recursive call setup/teardown overhead - Additional bounds checking - Stack frame management for each recursion level ### 4. Direct C Generation Lux generates straightforward C code: ```c int64_t fib_lux(int64_t n) { if (n <= 1) return n; return fib_lux(n - 1) + fib_lux(n - 2); } ``` This gives gcc maximum freedom to optimize without fighting language-specific abstractions. ### 5. Perceus Reference Counting Lux implements Koka-style Perceus reference counting: - FBIP (Functional But In-Place) optimization - Compile-time reference tracking where possible - Minimal runtime overhead for memory management For the fib benchmark (which doesn't allocate), this adds zero overhead. ## Comparison Context | Language | fib(35) | Type | vs Lux | |----------|---------|------|--------| | **Lux (compiled)** | 28.1ms | Compiled (via C) | baseline | | C (gcc -O3) | 29.0ms | Compiled | 1.03x slower | | Rust | 41.2ms | Compiled | 1.47x slower | | Zig | 47.0ms | Compiled | 1.67x slower | | Go | ~50ms | Compiled | ~1.8x slower | | LuaJIT | ~150ms | JIT | ~5x slower | | V8 (JS) | ~200ms | JIT | ~7x slower | | Lux (interp) | 254ms | Interpreted | 9x slower | | Python | ~3000ms | Interpreted | ~107x slower | ## When Lux Won't Be Fastest This benchmark is favorable to gcc's optimization patterns. Other scenarios: | Scenario | Likely Winner | Why | |----------|---------------|-----| | Simple recursion | **Lux/C** | gcc's strength | | SIMD/vectorization | Rust/Zig | Explicit SIMD intrinsics | | Async I/O | Rust (tokio) | Mature async runtime | | Memory-heavy workloads | Zig | Fine-grained allocator control | | Hot loops with bounds checks | C | No safety overhead | ## Running Benchmarks ### Using Nix Flake Commands ```bash # Full hyperfine benchmark (Lux vs C vs Rust vs Zig) nix run .#bench # Quick Lux vs C comparison nix run .#bench-quick # Detailed CPU metrics with poop nix run .#bench-poop ``` ### Manual Benchmark ```bash # Enter development shell (includes hyperfine, poop) nix develop # Compile all versions cargo run --release -- compile benchmarks/fib.lux -o /tmp/fib_lux gcc -O3 benchmarks/fib.c -o /tmp/fib_c rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig # Run hyperfine hyperfine --warmup 3 '/tmp/fib_lux' '/tmp/fib_c' '/tmp/fib_rust' '/tmp/fib_zig' # Run poop for detailed metrics poop '/tmp/fib_c' '/tmp/fib_lux' '/tmp/fib_rust' '/tmp/fib_zig' ``` ## Benchmark Files All benchmarks are in `/benchmarks/`: | File | Description | |------|-------------| | `fib.lux`, `fib.c`, `fib.rs`, `fib.zig` | Fibonacci (recursive) | | `ackermann.lux`, etc. | Ackermann function | | `primes.lux`, etc. | Prime counting | | `sumloop.lux`, etc. | Tight numeric loops | ## The Case for Lux Performance is excellent when compiled. But Lux also prioritizes: 1. **Developer Experience**: Clear error messages, effect system makes code predictable 2. **Correctness**: Types catch bugs, effects are explicit in signatures 3. **Simplicity**: No null pointers, no exceptions, no hidden control flow 4. **Testability**: Effects can be mocked without DI frameworks ## Methodology Notes - All benchmarks run on same machine, same session - hyperfine uses 3 warmup runs, 10 measured runs - poop provides Linux perf-based metrics - Compiler flags documented for reproducibility - Results may vary on different hardware/OS