# Compiler Optimizations from Behavioral Types This document describes optimization opportunities enabled by Lux's behavioral type system. When functions are annotated with properties like `is pure`, `is total`, `is idempotent`, `is deterministic`, or `is commutative`, the compiler gains knowledge that enables aggressive optimizations. ## Overview | Property | Key Optimizations | |----------|-------------------| | `is pure` | Memoization, CSE, dead code elimination, auto-parallelization | | `is total` | No exception handling, aggressive inlining, loop unrolling | | `is deterministic` | Result caching, test reproducibility, parallel execution | | `is idempotent` | Duplicate call elimination, retry optimization | | `is commutative` | Argument reordering, parallel reduction, algebraic simplification | ## Pure Function Optimizations When a function is marked `is pure`: ### 1. Memoization (Automatic Caching) ```lux fn fib(n: Int): Int is pure = if n <= 1 then n else fib(n - 1) + fib(n - 2) ``` **Optimization**: The compiler can automatically memoize results. Since `fib` is pure, `fib(10)` will always return the same value, so we can cache it. **Implementation approach**: - Maintain a hash map of argument → result mappings - Before computing, check if result exists - Store results after computation - Use LRU eviction for memory management **Impact**: Reduces exponential recursive calls to linear time. ### 2. Common Subexpression Elimination (CSE) ```lux fn compute(x: Int): Int is pure = expensive(x) + expensive(x) // Same call twice ``` **Optimization**: The compiler recognizes both calls are identical and computes `expensive(x)` only once. **Transformed to**: ```lux fn compute(x: Int): Int is pure = let temp = expensive(x) temp + temp ``` **Impact**: Eliminates redundant computation. ### 3. Dead Code Elimination ```lux fn example(): Int is pure = { let unused = expensiveComputation() // Result not used 42 } ``` **Optimization**: Since `expensiveComputation` is pure (no side effects), and its result is unused, the entire call can be eliminated. **Impact**: Removes unnecessary work. ### 4. Auto-Parallelization ```lux fn processAll(items: List): List is pure = List.map(items, processItem) // processItem is pure ``` **Optimization**: Since `processItem` is pure, each invocation is independent. The compiler can automatically parallelize the map operation. **Implementation approach**: - Detect pure functions in map/filter/fold operations - Split work across available cores - Merge results (order-preserving for map) **Impact**: Linear speedup with core count for CPU-bound operations. ### 5. Speculative Execution ```lux fn decide(cond: Bool, a: Int, b: Int): Int is pure = if cond then computeA(a) else computeB(b) ``` **Optimization**: Both branches can be computed in parallel before the condition is known, since neither has side effects. **Impact**: Reduced latency when condition evaluation is slow. ## Total Function Optimizations When a function is marked `is total`: ### 1. Exception Handling Elimination ```lux fn safeCompute(x: Int): Int is total = complexCalculation(x) ``` **Optimization**: No try/catch blocks needed around calls to `safeCompute`. The compiler knows it will never throw or fail. **Generated code difference**: ```c // Without is total - needs error checking Result result = safeCompute(x); if (result.is_error) { handle_error(); } // With is total - direct call int result = safeCompute(x); ``` **Impact**: Reduced code size, better branch prediction. ### 2. Aggressive Inlining ```lux fn square(x: Int): Int is total = x * x fn sumOfSquares(a: Int, b: Int): Int is total = square(a) + square(b) ``` **Optimization**: Total functions are safe to inline aggressively because: - They won't change control flow unexpectedly - They won't introduce exception handling complexity - Their termination is guaranteed **Impact**: Eliminates function call overhead, enables further optimizations. ### 3. Loop Unrolling ```lux fn sumList(xs: List): Int is total = List.fold(xs, 0, fn(acc: Int, x: Int): Int is total => acc + x) ``` **Optimization**: When the list size is known at compile time and the fold function is total, the loop can be fully unrolled. **Impact**: Eliminates loop overhead, enables vectorization. ### 4. Termination Assumptions ```lux fn processRecursive(data: Tree): Result is total = match data { Leaf(v) => Result.single(v), Node(left, right) => { let l = processRecursive(left) let r = processRecursive(right) Result.merge(l, r) } } ``` **Optimization**: The compiler can assume this recursion terminates, allowing optimizations like: - Converting recursion to iteration - Allocating fixed stack space - Tail call optimization **Impact**: Stack safety, predictable memory usage. ## Deterministic Function Optimizations When a function is marked `is deterministic`: ### 1. Compile-Time Evaluation ```lux fn hashConstant(s: String): Int is deterministic = computeHash(s) let key = hashConstant("api_key") // Constant input ``` **Optimization**: Since the input is a compile-time constant and the function is deterministic, the result can be computed at compile time. **Transformed to**: ```lux let key = 7823491 // Pre-computed ``` **Impact**: Zero runtime cost for constant computations. ### 2. Result Caching Across Runs ```lux fn parseConfig(path: String): Config is deterministic with {File} = Json.parse(File.read(path)) ``` **Optimization**: Results can be cached persistently. If the file hasn't changed, the cached result is valid. **Implementation approach**: - Hash inputs (including file contents) - Store results in persistent cache - Validate cache on next run **Impact**: Faster startup times, reduced I/O. ### 3. Reproducible Parallel Execution ```lux fn renderImages(images: List): List is deterministic = List.map(images, render) ``` **Optimization**: Deterministic parallel execution guarantees same results regardless of scheduling order. This enables: - Work stealing without synchronization concerns - Speculative execution without rollback complexity - Distributed computation across machines **Impact**: Easier parallelization, simpler distributed systems. ## Idempotent Function Optimizations When a function is marked `is idempotent`: ### 1. Duplicate Call Elimination ```lux fn setFlag(config: Config, flag: Bool): Config is idempotent = { ...config, enabled: flag } fn configure(c: Config): Config is idempotent = c |> setFlag(true) |> setFlag(true) |> setFlag(true) ``` **Optimization**: Multiple consecutive calls with the same arguments can be collapsed to one. **Transformed to**: ```lux fn configure(c: Config): Config is idempotent = setFlag(c, true) ``` **Impact**: Eliminates redundant operations. ### 2. Retry Optimization ```lux fn sendRequest(data: Request): Response is idempotent with {Http} = Http.put("/api/resource", data) fn reliableSend(data: Request): Response with {Http} = retry(3, fn(): Response => sendRequest(data)) ``` **Optimization**: The retry mechanism knows the operation is safe to retry without side effects accumulating. **Implementation approach**: - No need for transaction logs - No need for "already processed" checks - Simple retry loop **Impact**: Simpler error recovery, reduced complexity. ### 3. Convergent Computation ```lux fn normalize(value: Float): Float is idempotent = clamp(round(value, 2), 0.0, 1.0) ``` **Optimization**: In iterative algorithms, the compiler can detect when a value has converged (applying the function no longer changes it). ```lux // Can terminate early when values stop changing fn iterateUntilStable(values: List): List = let normalized = List.map(values, normalize) if normalized == values then values else iterateUntilStable(normalized) ``` **Impact**: Early termination of iterative algorithms. ## Commutative Function Optimizations When a function is marked `is commutative`: ### 1. Argument Reordering ```lux fn multiply(a: Int, b: Int): Int is commutative = a * b // In a computation multiply(expensiveA(), cheapB()) ``` **Optimization**: Evaluate the cheaper argument first to enable short-circuit optimizations or better register allocation. **Impact**: Improved instruction scheduling. ### 2. Parallel Reduction ```lux fn add(a: Int, b: Int): Int is commutative = a + b fn sum(xs: List): Int = List.fold(xs, 0, add) ``` **Optimization**: Since `add` is commutative (and associative), the fold can be parallelized: ``` [1, 2, 3, 4, 5, 6, 7, 8] ↓ parallel reduce [(1+2), (3+4), (5+6), (7+8)] ↓ parallel reduce [(3+7), (11+15)] ↓ parallel reduce [36] ``` **Impact**: O(log n) parallel reduction instead of O(n) sequential. ### 3. Algebraic Simplification ```lux fn add(a: Int, b: Int): Int is commutative = a + b // Expression: add(x, add(y, z)) ``` **Optimization**: Commutative operations can be reordered for simplification: - `add(x, 0)` → `x` - `add(add(x, 1), add(y, 1))` → `add(add(x, y), 2)` **Impact**: Constant folding, strength reduction. ## Combined Property Optimizations Properties can be combined for even more powerful optimizations: ### Pure + Deterministic + Total ```lux fn computeKey(data: String): Int is pure is deterministic is total = { // Hash computation List.fold(String.chars(data), 0, fn(acc: Int, c: Char): Int => acc * 31 + Char.code(c)) } ``` **Enabled optimizations**: - Compile-time evaluation for constants - Automatic memoization at runtime - Parallel execution in batch operations - No exception handling needed - Safe to inline anywhere ### Idempotent + Commutative ```lux fn setUnionItem(set: Set, item: T): Set is idempotent is commutative = { Set.add(set, item) } ``` **Enabled optimizations**: - Parallel set building (order doesn't matter) - Duplicate insertions are free (idempotent) - Reorder insertions for cache locality ## Implementation Status | Optimization | Status | |--------------|--------| | Pure: CSE | Planned | | Pure: Dead code elimination | Partial (basic) | | Pure: Auto-parallelization | Planned | | Total: Exception elimination | Planned | | Total: Aggressive inlining | Partial | | Deterministic: Compile-time eval | Planned | | Idempotent: Duplicate elimination | Planned | | Commutative: Parallel reduction | Planned | ## Adding New Optimizations When implementing new optimizations based on behavioral types: 1. **Verify the property is correct**: The optimization is only valid if the property holds 2. **Consider combinations**: Multiple properties together enable more optimizations 3. **Measure impact**: Profile before and after to ensure benefit 4. **Handle `assume`**: Functions using `assume` bypass verification but still enable optimizations (risk is on the programmer) ## Future Work 1. **Inter-procedural analysis**: Track properties across function boundaries 2. **Automatic property inference**: Derive properties when not explicitly stated 3. **Profile-guided optimization**: Use runtime data to decide when to apply optimizations 4. **LLVM integration**: Pass behavioral hints to LLVM for backend optimizations