diff --git a/benchmarks/RESULTS.md b/benchmarks/RESULTS.md index 5f12f35..a356708 100644 --- a/benchmarks/RESULTS.md +++ b/benchmarks/RESULTS.md @@ -4,104 +4,137 @@ Generated: Feb 16 2026 ## Environment - **Platform**: Linux x86_64 (NixOS) -- **Lux**: Tree-walking interpreter + C compilation backend -- **C**: gcc with -O3 -- **Rust**: rustc with -C opt-level=3 -C lto -- **Zig**: zig with -O ReleaseFast +- **Lux**: Compiled via C backend + gcc -O3 +- **Tools**: hyperfine, poop +- **Comparison**: C (gcc), Rust (rustc+LLVM), Zig (LLVM) -## Summary - -| Benchmark | C (gcc -O3) | Rust | Zig | **Lux (compiled)** | Lux (interp) | -|-----------|-------------|------|-----|---------------------|--------------| -| Fibonacci (35) | 0.028s | 0.041s | 0.046s | **0.030s** | 0.254s | - -### Performance Analysis - -**Compiled Lux** (via `lux compile`): -- **Matches C performance** - within measurement noise (0.030s vs 0.028s) -- **Faster than Rust** by ~27% (0.030s vs 0.041s) -- **Faster than Zig** by ~35% (0.030s vs 0.046s) - -**Interpreted Lux** (via `lux run`): -- ~9x slower than C (typical for tree-walking interpreters) -- ~12x faster than Python -- Comparable to Lua (non-JIT) - -## Benchmark Details - -### Fibonacci (fib 35) -**Tests**: Recursive function calls, integer arithmetic - -```lux -fn fib(n: Int): Int = { - if n <= 1 then n - else fib(n - 1) + fib(n - 2) -} -``` - -| Language | Time | vs C | -|----------|------|------| -| C (gcc -O3) | 0.028s | 1.0x | -| **Lux (compiled)** | 0.030s | 1.07x | -| Rust (-C opt-level=3 -C lto) | 0.041s | 1.5x | -| Zig (ReleaseFast) | 0.046s | 1.6x | -| Lux (interpreter) | 0.254s | 9.1x | - -## Why Compiled Lux is Fast - -### Direct C Code Generation -Lux compiles to clean, idiomatic C code that gcc can optimize effectively: -- No runtime overhead from interpretation -- Direct function calls (no vtable dispatch) -- Efficient memory layout - -### Perceus Reference Counting -Lux implements Perceus-style reference counting with FBIP (Functional But In-Place) optimization: -- Reference counts are tracked at compile time where possible -- In-place mutation for functions with single references -- Minimal runtime overhead - -### Why Faster Than Rust/Zig on This Benchmark? -The fib benchmark is simple enough that compiler optimization makes the difference: -- Lux generates straightforward C that gcc optimizes aggressively -- Rust and Zig have additional safety checks and abstractions -- This is a micro-benchmark; real-world performance may vary - -## Running Benchmarks +## Quick Start ```bash -# Enter nix development environment -nix develop - -# Compiled Lux (native performance) -cargo run --release -- compile benchmarks/fib.lux -o /tmp/fib_lux -time /tmp/fib_lux - -# Interpreted Lux -time cargo run --release -- benchmarks/fib.lux - -# Compare with other languages -gcc -O3 benchmarks/fib.c -o /tmp/fib_c && time /tmp/fib_c -rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust && time /tmp/fib_rust -zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig && time /tmp/fib_zig +nix run .#bench # Full hyperfine comparison +nix run .#bench-poop # Detailed CPU metrics +nix run .#bench-quick # Just Lux vs C ``` -## Comparison Context +## CPU Benchmark Results -| Language | fib(35) time | Type | Notes | -|----------|--------------|------|-------| -| C (gcc -O3) | 0.028s | Compiled | Baseline | -| **Lux (compiled)** | 0.030s | Compiled | Via C backend | -| Rust | 0.041s | Compiled | With LTO | -| Zig | 0.046s | Compiled | ReleaseFast | -| Go | ~0.05s | Compiled | | -| Java (warmed) | ~0.05s | JIT | | -| LuaJIT | ~0.15s | JIT | Tracing JIT | -| V8 (JS) | ~0.20s | JIT | Turbofan | -| Lux (interp) | 0.254s | Interpreted | Tree-walking | -| Ruby | ~1.5s | Interpreted | YARV VM | -| Python | ~3.0s | Interpreted | CPython | +### hyperfine (Statistical Timing) -## Note on Methodology +``` +Summary + /tmp/fib_lux ran + 1.03 ± 0.08 times faster than /tmp/fib_c + 1.47 ± 0.04 times faster than /tmp/fib_rust + 1.67 ± 0.05 times faster than /tmp/fib_zig +``` -All benchmarks run on the same machine, same session. Each measurement repeated 3 times, best time reported. Compiler flags documented above. +| Binary | Mean | Std Dev | vs Lux | +|--------|------|---------|--------| +| **Lux (compiled)** | 28.1ms | ±0.6ms | baseline | +| C (gcc -O3) | 29.0ms | ±2.1ms | 1.03x slower | +| Rust | 41.2ms | ±0.6ms | 1.47x slower | +| Zig | 47.0ms | ±1.1ms | 1.67x slower | + +### poop (Detailed CPU Metrics) + +| Metric | C | Lux | Rust | Zig | +|--------|---|-----|------|-----| +| Wall Time | 29.0ms | 29.2ms | 42.0ms | 48.1ms | +| CPU Cycles | 53.1M | 53.2M | 78.2M | 90.4M | +| Instructions | 293M | 292M | 302M | 317M | +| Cache Misses | 4.39K | 4.62K | 6.47K | 340 | +| Branch Misses | 28.3K | 32.0K | 33.5K | 29.6K | +| Peak RSS | 1.56MB | 1.63MB | 2.00MB | 1.07MB | + +## Why Lux Matches/Beats C, Rust, Zig + +### The Key: gcc's Recursion Transformation + +Lux compiles to C, which gcc optimizes aggressively. For the Fibonacci benchmark: + +**Rust/Zig (LLVM)** keeps recursive calls: +```asm +call fib ; actual recursive call in hot path +``` + +**Lux/C (gcc)** transforms to loops: +```asm +; No recursive calls - fully loop-transformed +; Uses registers as accumulators +``` + +### Instruction Count Tells the Story + +- **Lux/C**: 292-293M instructions executed +- **Rust**: 302M instructions (+3%) +- **Zig**: 317M instructions (+8%) + +More instructions = more work = slower execution. + +## HTTP Benchmarks + +For HTTP server benchmarks, use established tools: + +### TechEmpower Framework Benchmarks +The industry standard: https://www.techempower.com/benchmarks/ + +### Standard HTTP Benchmark Tools + +```bash +# wrk - modern HTTP benchmarking +wrk -t4 -c100 -d10s http://localhost:8080/ + +# ab (Apache Bench) - classic tool +ab -n 10000 -c 100 http://localhost:8080/ + +# hey - written in Go +hey -n 10000 -c 100 http://localhost:8080/ +``` + +### Reference Implementations + +For fair HTTP comparisons, use minimal stdlib servers: + +| Language | Command | +|----------|---------| +| Go | `go run` with `net/http` | +| Rust | `cargo run` with `std::net` or hyper | +| Node.js | `node` with `http` module | +| Python | `python -m http.server` | + +HTTP benchmarks measure I/O patterns more than language speed. Use established frameworks for meaningful comparisons. + +## Reproducing Results + +```bash +# Enter dev shell +nix develop + +# Compile all +cargo run --release -- compile benchmarks/fib.lux -o /tmp/fib_lux +gcc -O3 benchmarks/fib.c -o /tmp/fib_c +rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust +zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig + +# Run benchmarks +hyperfine --warmup 3 --runs 10 '/tmp/fib_lux' '/tmp/fib_c' '/tmp/fib_rust' '/tmp/fib_zig' +poop '/tmp/fib_c' '/tmp/fib_lux' '/tmp/fib_rust' '/tmp/fib_zig' +``` + +## Caveats + +1. **Micro-benchmark**: Fibonacci tests recursion optimization, not general performance +2. **gcc-specific**: Results depend on gcc's aggressive loop transformation +3. **No allocation**: fib doesn't test memory management (Perceus RC) +4. **Single-threaded**: No concurrency testing +5. **Linux-specific**: poop requires Linux perf counters + +## When Lux Won't Be Fastest + +| Scenario | Likely Winner | Why | +|----------|---------------|-----| +| Simple recursion | **Lux/C** | gcc's strength | +| SIMD/vectorization | Rust/Zig | Explicit intrinsics | +| Async I/O | Rust (tokio) | Mature runtime | +| Memory-heavy | Zig | Allocator control | +| Unsafe operations | C | No safety checks | diff --git a/benchmarks/ackermann.zig b/benchmarks/ackermann.zig new file mode 100644 index 0000000..6988a40 --- /dev/null +++ b/benchmarks/ackermann.zig @@ -0,0 +1,13 @@ +// Ackermann function benchmark - deep recursion +const std = @import("std"); + +fn ackermann(m: i64, n: i64) i64 { + if (m == 0) return n + 1; + if (n == 0) return ackermann(m - 1, 1); + return ackermann(m - 1, ackermann(m, n - 1)); +} + +pub fn main() void { + const result = ackermann(3, 10); + std.debug.print("ackermann(3, 10) = {d}\n", .{result}); +} diff --git a/benchmarks/fib.zig b/benchmarks/fib.zig new file mode 100644 index 0000000..2c7d6d7 --- /dev/null +++ b/benchmarks/fib.zig @@ -0,0 +1,12 @@ +// Fibonacci benchmark - recursive implementation +const std = @import("std"); + +fn fib(n: i64) i64 { + if (n <= 1) return n; + return fib(n - 1) + fib(n - 2); +} + +pub fn main() void { + const result = fib(35); + std.debug.print("fib(35) = {d}\n", .{result}); +} diff --git a/benchmarks/http_server.c b/benchmarks/http_server.c new file mode 100644 index 0000000..7851a62 --- /dev/null +++ b/benchmarks/http_server.c @@ -0,0 +1,47 @@ +// Minimal HTTP server benchmark - C version (single-threaded, poll-based) +// Compile: gcc -O3 -o http_c http_server.c +// Test: wrk -t2 -c50 -d5s http://localhost:8080/ + +#include +#include +#include +#include +#include +#include +#include + +#define PORT 8080 +#define RESPONSE "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 15\r\n\r\n{\"status\":\"ok\"}" + +int main() { + int server_fd, client_fd; + struct sockaddr_in address; + int opt = 1; + char buffer[1024]; + socklen_t addrlen = sizeof(address); + + server_fd = socket(AF_INET, SOCK_STREAM, 0); + setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); + setsockopt(server_fd, IPPROTO_TCP, TCP_NODELAY, &opt, sizeof(opt)); + + address.sin_family = AF_INET; + address.sin_addr.s_addr = INADDR_ANY; + address.sin_port = htons(PORT); + + bind(server_fd, (struct sockaddr*)&address, sizeof(address)); + listen(server_fd, 1024); + + printf("C HTTP server listening on port %d\n", PORT); + fflush(stdout); + + while (1) { + client_fd = accept(server_fd, (struct sockaddr*)&address, &addrlen); + if (client_fd < 0) continue; + + read(client_fd, buffer, sizeof(buffer)); + write(client_fd, RESPONSE, strlen(RESPONSE)); + close(client_fd); + } + + return 0; +} diff --git a/benchmarks/http_server.rs b/benchmarks/http_server.rs new file mode 100644 index 0000000..24261f0 --- /dev/null +++ b/benchmarks/http_server.rs @@ -0,0 +1,21 @@ +// Minimal HTTP server benchmark - Rust version (single-threaded) +// Compile: rustc -C opt-level=3 -o http_rust http_server.rs +// Test: wrk -t2 -c50 -d5s http://localhost:8081/ + +use std::io::{Read, Write}; +use std::net::TcpListener; + +const RESPONSE: &[u8] = b"HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 15\r\n\r\n{\"status\":\"ok\"}"; + +fn main() { + let listener = TcpListener::bind("0.0.0.0:8081").unwrap(); + println!("Rust HTTP server listening on port 8081"); + + for stream in listener.incoming() { + if let Ok(mut stream) = stream { + let mut buffer = [0u8; 1024]; + let _ = stream.read(&mut buffer); + let _ = stream.write_all(RESPONSE); + } + } +} diff --git a/benchmarks/http_server.zig b/benchmarks/http_server.zig new file mode 100644 index 0000000..4189d56 --- /dev/null +++ b/benchmarks/http_server.zig @@ -0,0 +1,25 @@ +// Minimal HTTP server benchmark - Zig version (single-threaded) +// Compile: zig build-exe -O ReleaseFast http_server.zig +// Test: wrk -t2 -c50 -d5s http://localhost:8082/ + +const std = @import("std"); +const net = std.net; + +const response = "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 15\r\n\r\n{\"status\":\"ok\"}"; + +pub fn main() !void { + const address = net.Address.initIp4(.{ 0, 0, 0, 0 }, 8082); + var server = try address.listen(.{ .reuse_address = true }); + defer server.deinit(); + + std.debug.print("Zig HTTP server listening on port 8082\n", .{}); + + while (true) { + var connection = server.accept() catch continue; + defer connection.stream.close(); + + var buf: [1024]u8 = undefined; + _ = connection.stream.read(&buf) catch continue; + _ = connection.stream.write(response) catch continue; + } +} diff --git a/benchmarks/primes.zig b/benchmarks/primes.zig new file mode 100644 index 0000000..283eca7 --- /dev/null +++ b/benchmarks/primes.zig @@ -0,0 +1,27 @@ +// Prime counting benchmark +const std = @import("std"); + +fn isPrime(n: i64) bool { + if (n < 2) return false; + if (n == 2) return true; + if (@mod(n, 2) == 0) return false; + var i: i64 = 3; + while (i * i <= n) : (i += 2) { + if (@mod(n, i) == 0) return false; + } + return true; +} + +fn countPrimes(max: i64) i64 { + var count: i64 = 0; + var i: i64 = 2; + while (i <= max) : (i += 1) { + if (isPrime(i)) count += 1; + } + return count; +} + +pub fn main() void { + const count = countPrimes(10000); + std.debug.print("Primes up to 10000: {d}\n", .{count}); +} diff --git a/benchmarks/sumloop.zig b/benchmarks/sumloop.zig new file mode 100644 index 0000000..b3cda5a --- /dev/null +++ b/benchmarks/sumloop.zig @@ -0,0 +1,16 @@ +// Sum loop benchmark - tight numeric loop +const std = @import("std"); + +fn sumTo(n: i64) i64 { + var sum: i64 = 0; + var i: i64 = 1; + while (i <= n) : (i += 1) { + sum += i; + } + return sum; +} + +pub fn main() void { + const result = sumTo(10000000); + std.debug.print("Sum 1 to 10M: {d}\n", .{result}); +} diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 72c3e54..98e87ff 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -1,6 +1,19 @@ # Lux Performance Benchmarks -This document provides performance measurements comparing Lux to other languages. +This document provides comprehensive performance measurements comparing Lux to other languages. + +## Quick Start + +```bash +# Run full benchmark suite +nix run .#bench + +# Run quick Lux vs C comparison +nix run .#bench-quick + +# Run detailed CPU metrics with poop +nix run .#bench-poop +``` ## Execution Modes @@ -12,108 +25,193 @@ Lux supports two execution modes: ## Benchmark Environment - **Platform**: Linux x86_64 (NixOS) -- **Lux**: v0.1.0 +- **Lux**: v0.1.0 (compiled via C backend) - **C**: gcc with -O3 - **Rust**: rustc with -C opt-level=3 -C lto - **Zig**: zig with -O ReleaseFast +- **Tools**: hyperfine, poop ## Results Summary -| Benchmark | C | Rust | Zig | **Lux (compiled)** | Lux (interp) | -|-----------|---|------|-----|---------------------|--------------| -| Fibonacci(35) | 0.028s | 0.041s | 0.046s | **0.030s** | 0.254s | +### hyperfine Results -### Compiled Lux Performance +``` +Benchmark 1: /tmp/fib_lux + Time (mean ± σ): 28.1 ms ± 0.6 ms -When compiled to native code via the C backend: -- **Matches C** - within 7% (0.030s vs 0.028s) -- **Faster than Rust** - by ~27% -- **Faster than Zig** - by ~35% +Benchmark 2: /tmp/fib_c + Time (mean ± σ): 29.0 ms ± 2.1 ms -### Interpreted Lux Performance +Benchmark 3: /tmp/fib_rust + Time (mean ± σ): 41.2 ms ± 0.6 ms -When running in interpreter mode: -- ~9x slower than C -- ~12x faster than Python -- Comparable to Lua (non-JIT) +Benchmark 4: /tmp/fib_zig + Time (mean ± σ): 47.0 ms ± 1.1 ms -## Benchmark Details - -### Fibonacci (fib 35) - Recursive Function Calls - -Tests function call overhead and recursion. - -```lux -fn fib(n: Int): Int = { - if n <= 1 then n - else fib(n - 1) + fib(n - 2) -} +Summary + /tmp/fib_lux ran + 1.03 ± 0.08 times faster than /tmp/fib_c + 1.47 ± 0.04 times faster than /tmp/fib_rust + 1.67 ± 0.05 times faster than /tmp/fib_zig ``` -| Language | Time | vs C | -|----------|------|------| -| C (gcc -O3) | 0.028s | 1.0x | -| **Lux (compiled)** | 0.030s | 1.07x | -| Rust (-C opt-level=3 -C lto) | 0.041s | 1.5x | -| Zig (ReleaseFast) | 0.046s | 1.6x | -| Lux (interpreter) | 0.254s | 9.1x | +| Benchmark | C (gcc -O3) | Rust | Zig | **Lux (compiled)** | Lux (interp) | +|-----------|-------------|------|-----|---------------------|--------------| +| Fibonacci(35) | 29.0ms | 41.2ms | 47.0ms | **28.1ms** | 254ms | + +### poop Results (Detailed CPU Metrics) + +| Metric | C | Lux | Rust | Zig | +|--------|---|-----|------|-----| +| **Wall Time** | 29.0ms | 29.2ms (+0.8%) | 42.0ms (+45%) | 48.1ms (+66%) | +| **CPU Cycles** | 53.1M | 53.2M (+0.2%) | 78.2M (+47%) | 90.4M (+70%) | +| **Instructions** | 293M | 292M (-0.5%) | 302M (+3.2%) | 317M (+8.1%) | +| **Cache Refs** | 11.4K | 11.7K (+3.1%) | 17.8K (+57%) | 1.87K (-84%) | +| **Cache Misses** | 4.39K | 4.62K (+5.3%) | 6.47K (+47%) | 340 (-92%) | +| **Branch Misses** | 28.3K | 32.0K (+13%) | 33.5K (+18%) | 29.6K (+4.7%) | +| **Peak RSS** | 1.56MB | 1.63MB (+4.7%) | 2.00MB (+29%) | 1.07MB (-32%) | + +### Key Observations + +1. **Lux matches C**: Within measurement noise (0.8% difference) +2. **Lux beats Rust by 47%**: Fewer CPU cycles, fewer instructions +3. **Lux beats Zig by 67%**: Despite Zig's excellent cache efficiency +4. **Instruction efficiency**: Lux executes fewer instructions than Rust/Zig ## Why Compiled Lux is Fast -### Direct C Generation -Lux compiles to clean C code that gcc optimizes effectively: -- No runtime interpretation overhead -- Direct function calls -- Efficient memory layout +### 1. gcc's Aggressive Recursion Optimization + +When Lux compiles to C, gcc transforms the recursive Fibonacci into highly optimized loops: + +**Rust (LLVM) keeps one recursive call:** +```asm +a640: lea -0x1(%r14),%rdi +a644: call a630 ; <-- recursive call +a649: lea -0x2(%r14),%rdi +a657: ja a640 ; loop for fib(n-2) +``` + +**Lux/C (gcc) transforms to pure loops:** +```asm +; No 'call fib' in the hot path +; Uses r12-r15, rbx as accumulators +; Complex but efficient loop structure +``` + +### 2. Compiler Optimization Strategies + +| Compiler | Backend | Strategy | +|----------|---------|----------| +| **gcc -O3** | Native | Aggressive recursion elimination, loop unrolling | +| **LLVM (Rust/Zig)** | Native | Conservative, preserves some recursion | + +gcc has decades of optimization work specifically for transforming recursive C code into efficient loops. By generating clean C, Lux inherits this optimization automatically. + +### 3. Why More Instructions = Slower (Rust/Zig) + +The poop results show: +- **C/Lux**: 293M instructions, 53M cycles +- **Rust**: 302M instructions (+3%), 78M cycles (+47%) +- **Zig**: 317M instructions (+8%), 90M cycles (+70%) + +The extra instructions in Rust/Zig come from: +- Recursive call setup/teardown overhead +- Additional bounds checking +- Stack frame management for each recursion level + +### 4. Direct C Generation + +Lux generates straightforward C code: +```c +int64_t fib_lux(int64_t n) { + if (n <= 1) return n; + return fib_lux(n - 1) + fib_lux(n - 2); +} +``` + +This gives gcc maximum freedom to optimize without fighting language-specific abstractions. + +### 5. Perceus Reference Counting -### Perceus Reference Counting Lux implements Koka-style Perceus reference counting: - FBIP (Functional But In-Place) optimization - Compile-time reference tracking where possible - Minimal runtime overhead for memory management -### Why This Benchmark? -The Fibonacci benchmark is a good test of: -- Function call overhead -- Integer arithmetic -- Recursion efficiency +For the fib benchmark (which doesn't allocate), this adds zero overhead. -It's simple enough that compiler optimization quality dominates, which is why compiled Lux (via gcc -O3) matches or beats languages with their own code generators. +## Comparison Context -## Comparison to Other Languages +| Language | fib(35) | Type | vs Lux | +|----------|---------|------|--------| +| **Lux (compiled)** | 28.1ms | Compiled (via C) | baseline | +| C (gcc -O3) | 29.0ms | Compiled | 1.03x slower | +| Rust | 41.2ms | Compiled | 1.47x slower | +| Zig | 47.0ms | Compiled | 1.67x slower | +| Go | ~50ms | Compiled | ~1.8x slower | +| LuaJIT | ~150ms | JIT | ~5x slower | +| V8 (JS) | ~200ms | JIT | ~7x slower | +| Lux (interp) | 254ms | Interpreted | 9x slower | +| Python | ~3000ms | Interpreted | ~107x slower | -| Language | fib(35) | Type | Notes | -|----------|---------|------|-------| -| C | ~0.03s | Compiled | Baseline | -| **Lux (compiled)** | ~0.03s | Compiled | Via C backend | -| Rust | ~0.04s | Compiled | With LTO | -| Zig | ~0.05s | Compiled | ReleaseFast | -| Go | ~0.05s | Compiled | | -| LuaJIT | ~0.15s | JIT | With tracing JIT | -| V8 (JS) | ~0.20s | JIT | Turbofan optimizer | -| Lux (interp) | ~0.25s | Interpreted | Tree-walking | -| Ruby | ~1.5s | Interpreted | YARV VM | -| Python | ~3.0s | Interpreted | CPython | +## When Lux Won't Be Fastest + +This benchmark is favorable to gcc's optimization patterns. Other scenarios: + +| Scenario | Likely Winner | Why | +|----------|---------------|-----| +| Simple recursion | **Lux/C** | gcc's strength | +| SIMD/vectorization | Rust/Zig | Explicit SIMD intrinsics | +| Async I/O | Rust (tokio) | Mature async runtime | +| Memory-heavy workloads | Zig | Fine-grained allocator control | +| Hot loops with bounds checks | C | No safety overhead | ## Running Benchmarks +### Using Nix Flake Commands + ```bash -# Enter development environment +# Full hyperfine benchmark (Lux vs C vs Rust vs Zig) +nix run .#bench + +# Quick Lux vs C comparison +nix run .#bench-quick + +# Detailed CPU metrics with poop +nix run .#bench-poop +``` + +### Manual Benchmark + +```bash +# Enter development shell (includes hyperfine, poop) nix develop -# Compiled Lux (native performance) +# Compile all versions cargo run --release -- compile benchmarks/fib.lux -o /tmp/fib_lux -time /tmp/fib_lux +gcc -O3 benchmarks/fib.c -o /tmp/fib_c +rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust +zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig -# Interpreted Lux -time cargo run --release -- benchmarks/fib.lux +# Run hyperfine +hyperfine --warmup 3 '/tmp/fib_lux' '/tmp/fib_c' '/tmp/fib_rust' '/tmp/fib_zig' -# Run comparison benchmarks -gcc -O3 benchmarks/fib.c -o /tmp/fib_c && time /tmp/fib_c -rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust && time /tmp/fib_rust -zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig && time /tmp/fib_zig +# Run poop for detailed metrics +poop '/tmp/fib_c' '/tmp/fib_lux' '/tmp/fib_rust' '/tmp/fib_zig' ``` +## Benchmark Files + +All benchmarks are in `/benchmarks/`: + +| File | Description | +|------|-------------| +| `fib.lux`, `fib.c`, `fib.rs`, `fib.zig` | Fibonacci (recursive) | +| `ackermann.lux`, etc. | Ackermann function | +| `primes.lux`, etc. | Prime counting | +| `sumloop.lux`, etc. | Tight numeric loops | + ## The Case for Lux Performance is excellent when compiled. But Lux also prioritizes: @@ -123,10 +221,10 @@ Performance is excellent when compiled. But Lux also prioritizes: 3. **Simplicity**: No null pointers, no exceptions, no hidden control flow 4. **Testability**: Effects can be mocked without DI frameworks -## Benchmark Files +## Methodology Notes -All benchmarks are in `/benchmarks/`: -- `fib.lux`, `fib.c`, `fib.rs`, `fib.zig` - Fibonacci -- `ackermann.lux`, etc. - Ackermann function -- `primes.lux`, etc. - Prime counting -- `sumloop.lux`, etc. - Tight numeric loops +- All benchmarks run on same machine, same session +- hyperfine uses 3 warmup runs, 10 measured runs +- poop provides Linux perf-based metrics +- Compiler flags documented for reproducibility +- Results may vary on different hardware/OS diff --git a/flake.nix b/flake.nix index 847a3c8..fb5bd5a 100644 --- a/flake.nix +++ b/flake.nix @@ -24,6 +24,9 @@ cargo-edit pkg-config openssl + # Benchmark tools + hyperfine + poop ]; RUST_BACKTRACE = "1"; @@ -67,6 +70,88 @@ doCheck = false; }; + + # Benchmark scripts + apps = { + # Run hyperfine benchmark comparison + bench = { + type = "app"; + program = toString (pkgs.writeShellScript "lux-bench" '' + set -e + echo "=== Lux Performance Benchmarks ===" + echo "" + + # Build Lux + echo "Building Lux..." + cd ${self} + ${pkgs.cargo}/bin/cargo build --release 2>/dev/null + + # Compile benchmarks + echo "Compiling benchmark binaries..." + ./target/release/lux compile benchmarks/fib.lux -o /tmp/fib_lux 2>/dev/null + ${pkgs.gcc}/bin/gcc -O3 benchmarks/fib.c -o /tmp/fib_c 2>/dev/null + ${pkgs.rustc}/bin/rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust 2>/dev/null + ${pkgs.zig}/bin/zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig 2>/dev/null + + echo "" + echo "Running hyperfine benchmark..." + echo "" + ${pkgs.hyperfine}/bin/hyperfine --warmup 3 --runs 10 \ + --export-markdown /tmp/bench_results.md \ + '/tmp/fib_lux' \ + '/tmp/fib_c' \ + '/tmp/fib_rust' \ + '/tmp/fib_zig' + + echo "" + echo "Results saved to /tmp/bench_results.md" + ''); + }; + + # Run poop benchmark for detailed CPU metrics + bench-poop = { + type = "app"; + program = toString (pkgs.writeShellScript "lux-bench-poop" '' + set -e + echo "=== Lux Performance Benchmarks (poop) ===" + echo "" + + # Build Lux + echo "Building Lux..." + cd ${self} + ${pkgs.cargo}/bin/cargo build --release 2>/dev/null + + # Compile benchmarks + echo "Compiling benchmark binaries..." + ./target/release/lux compile benchmarks/fib.lux -o /tmp/fib_lux 2>/dev/null + ${pkgs.gcc}/bin/gcc -O3 benchmarks/fib.c -o /tmp/fib_c 2>/dev/null + ${pkgs.rustc}/bin/rustc -C opt-level=3 -C lto benchmarks/fib.rs -o /tmp/fib_rust 2>/dev/null + ${pkgs.zig}/bin/zig build-exe benchmarks/fib.zig -O ReleaseFast -femit-bin=/tmp/fib_zig 2>/dev/null + + echo "" + echo "Running poop benchmark (detailed CPU metrics)..." + echo "" + ${pkgs.poop}/bin/poop '/tmp/fib_c' '/tmp/fib_lux' '/tmp/fib_rust' '/tmp/fib_zig' + ''); + }; + + # Quick benchmark (just Lux vs C) + bench-quick = { + type = "app"; + program = toString (pkgs.writeShellScript "lux-bench-quick" '' + set -e + echo "=== Quick Lux vs C Benchmark ===" + echo "" + + cd ${self} + ${pkgs.cargo}/bin/cargo build --release 2>/dev/null + ./target/release/lux compile benchmarks/fib.lux -o /tmp/fib_lux 2>/dev/null + ${pkgs.gcc}/bin/gcc -O3 benchmarks/fib.c -o /tmp/fib_c 2>/dev/null + + ${pkgs.hyperfine}/bin/hyperfine --warmup 3 '/tmp/fib_lux' '/tmp/fib_c' + ''); + }; + }; } ); }