lux/docs/COMPILER_OPTIMIZATIONS.md

# Compiler Optimizations from Behavioral Types

This document describes optimization opportunities enabled by Lux's behavioral type system. When functions are annotated with properties like `is pure`, `is total`, `is idempotent`, `is deterministic`, or `is commutative`, the compiler gains knowledge that enables aggressive optimizations.

## Overview

| Property | Key Optimizations |
|----------|-------------------|
| `is pure` | Memoization, CSE, dead code elimination, auto-parallelization |
| `is total` | No exception handling, aggressive inlining, loop unrolling |
| `is deterministic` | Result caching, test reproducibility, parallel execution |
| `is idempotent` | Duplicate call elimination, retry optimization |
| `is commutative` | Argument reordering, parallel reduction, algebraic simplification |

## Pure Function Optimizations

When a function is marked `is pure`:

### 1. Memoization (Automatic Caching)

```lux
fn fib(n: Int): Int is pure =
    if n <= 1 then n else fib(n - 1) + fib(n - 2)
```

**Optimization**: The compiler can automatically memoize results. Since `fib` is pure, `fib(10)` will always return the same value, so we can cache it.

**Implementation approach**:
- Maintain a hash map of argument → result mappings
- Before computing, check if result exists
- Store results after computation
- Use LRU eviction for memory management

**Impact**: Reduces exponential recursive calls to linear time.

### 2. Common Subexpression Elimination (CSE)

```lux
fn compute(x: Int): Int is pure =
    expensive(x) + expensive(x)  // Same call twice
```

**Optimization**: The compiler recognizes both calls are identical and computes `expensive(x)` only once.

**Transformed to**:
```lux
fn compute(x: Int): Int is pure =
    let temp = expensive(x)
    temp + temp
```

**Impact**: Eliminates redundant computation.

### 3. Dead Code Elimination

```lux
fn example(): Int is pure = {
    let unused = expensiveComputation()  // Result not used
    42
}
```

**Optimization**: Since `expensiveComputation` is pure (no side effects), and its result is unused, the entire call can be eliminated.

**Impact**: Removes unnecessary work.

### 4. Auto-Parallelization

```lux
fn processAll(items: List<Item>): List<Result> is pure =
    List.map(items, processItem)  // processItem is pure
```

**Optimization**: Since `processItem` is pure, each invocation is independent. The compiler can automatically parallelize the map operation.

**Implementation approach**:
- Detect pure functions in map/filter/fold operations
- Split work across available cores
- Merge results (order-preserving for map)

**Impact**: Linear speedup with core count for CPU-bound operations.

### 5. Speculative Execution

```lux
fn decide(cond: Bool, a: Int, b: Int): Int is pure =
    if cond then computeA(a) else computeB(b)
```

**Optimization**: Both branches can be computed in parallel before the condition is known, since neither has side effects.

**Impact**: Reduced latency when condition evaluation is slow.

## Total Function Optimizations

When a function is marked `is total`:

### 1. Exception Handling Elimination

```lux
fn safeCompute(x: Int): Int is total =
    complexCalculation(x)
```

**Optimization**: No try/catch blocks needed around calls to `safeCompute`. The compiler knows it will never throw or fail.

**Generated code difference**:
```c
// Without is total - needs error checking
Result result = safeCompute(x);
if (result.is_error) { handle_error(); }

// With is total - direct call
int result = safeCompute(x);
```

**Impact**: Reduced code size, better branch prediction.

### 2. Aggressive Inlining

```lux
fn square(x: Int): Int is total = x * x

fn sumOfSquares(a: Int, b: Int): Int is total =
    square(a) + square(b)
```

**Optimization**: Total functions are safe to inline aggressively because:
- They won't change control flow unexpectedly
- They won't introduce exception handling complexity
- Their termination is guaranteed

**Impact**: Eliminates function call overhead, enables further optimizations.

### 3. Loop Unrolling

```lux
fn sumList(xs: List<Int>): Int is total =
    List.fold(xs, 0, fn(acc: Int, x: Int): Int is total => acc + x)
```

**Optimization**: When the list size is known at compile time and the fold function is total, the loop can be fully unrolled.

**Impact**: Eliminates loop overhead, enables vectorization.

### 4. Termination Assumptions

```lux
fn processRecursive(data: Tree): Result is total =
    match data {
        Leaf(v) => Result.single(v),
        Node(left, right) => {
            let l = processRecursive(left)
            let r = processRecursive(right)
            Result.merge(l, r)
        }
    }
```

**Optimization**: The compiler can assume this recursion terminates, allowing optimizations like:
- Converting recursion to iteration
- Allocating fixed stack space
- Tail call optimization

**Impact**: Stack safety, predictable memory usage.

## Deterministic Function Optimizations

When a function is marked `is deterministic`:

### 1. Compile-Time Evaluation

```lux
fn hashConstant(s: String): Int is deterministic = computeHash(s)

let key = hashConstant("api_key")  // Constant input
```

**Optimization**: Since the input is a compile-time constant and the function is deterministic, the result can be computed at compile time.

**Transformed to**:
```lux
let key = 7823491  // Pre-computed
```

**Impact**: Zero runtime cost for constant computations.

### 2. Result Caching Across Runs

```lux
fn parseConfig(path: String): Config is deterministic with {File} =
    Json.parse(File.read(path))
```

**Optimization**: Results can be cached persistently. If the file hasn't changed, the cached result is valid.

**Implementation approach**:
- Hash inputs (including file contents)
- Store results in persistent cache
- Validate cache on next run

**Impact**: Faster startup times, reduced I/O.

### 3. Reproducible Parallel Execution

```lux
fn renderImages(images: List<Image>): List<Bitmap> is deterministic =
    List.map(images, render)
```

**Optimization**: Deterministic parallel execution guarantees same results regardless of scheduling order. This enables:
- Work stealing without synchronization concerns
- Speculative execution without rollback complexity
- Distributed computation across machines

**Impact**: Easier parallelization, simpler distributed systems.

## Idempotent Function Optimizations

When a function is marked `is idempotent`:

### 1. Duplicate Call Elimination

```lux
fn setFlag(config: Config, flag: Bool): Config is idempotent =
    { ...config, enabled: flag }

fn configure(c: Config): Config is idempotent =
    c |> setFlag(true) |> setFlag(true) |> setFlag(true)
```

**Optimization**: Multiple consecutive calls with the same arguments can be collapsed to one.

**Transformed to**:
```lux
fn configure(c: Config): Config is idempotent =
    setFlag(c, true)
```

**Impact**: Eliminates redundant operations.

### 2. Retry Optimization

```lux
fn sendRequest(data: Request): Response is idempotent with {Http} =
    Http.put("/api/resource", data)

fn reliableSend(data: Request): Response with {Http} =
    retry(3, fn(): Response => sendRequest(data))
```

**Optimization**: The retry mechanism knows the operation is safe to retry without side effects accumulating.

**Implementation approach**:
- No need for transaction logs
- No need for "already processed" checks
- Simple retry loop

**Impact**: Simpler error recovery, reduced complexity.

### 3. Convergent Computation

```lux
fn normalize(value: Float): Float is idempotent =
    clamp(round(value, 2), 0.0, 1.0)
```

**Optimization**: In iterative algorithms, the compiler can detect when a value has converged (applying the function no longer changes it).

```lux
// Can terminate early when values stop changing
fn iterateUntilStable(values: List<Float>): List<Float> =
    let normalized = List.map(values, normalize)
    if normalized == values then values
    else iterateUntilStable(normalized)
```

**Impact**: Early termination of iterative algorithms.

## Commutative Function Optimizations

When a function is marked `is commutative`:

### 1. Argument Reordering

```lux
fn multiply(a: Int, b: Int): Int is commutative = a * b

// In a computation
multiply(expensiveA(), cheapB())
```

**Optimization**: Evaluate the cheaper argument first to enable short-circuit optimizations or better register allocation.

**Impact**: Improved instruction scheduling.

### 2. Parallel Reduction

```lux
fn add(a: Int, b: Int): Int is commutative = a + b

fn sum(xs: List<Int>): Int =
    List.fold(xs, 0, add)
```

**Optimization**: Since `add` is commutative (and associative), the fold can be parallelized:

```
[1, 2, 3, 4, 5, 6, 7, 8]
    ↓ parallel reduce
[(1+2), (3+4), (5+6), (7+8)]
    ↓ parallel reduce
[(3+7), (11+15)]
    ↓ parallel reduce
[36]
```

**Impact**: O(log n) parallel reduction instead of O(n) sequential.

### 3. Algebraic Simplification

```lux
fn add(a: Int, b: Int): Int is commutative = a + b

// Expression: add(x, add(y, z))
```

**Optimization**: Commutative operations can be reordered for simplification:
- `add(x, 0)` → `x`
- `add(add(x, 1), add(y, 1))` → `add(add(x, y), 2)`

**Impact**: Constant folding, strength reduction.

## Combined Property Optimizations

Properties can be combined for even more powerful optimizations:

### Pure + Deterministic + Total

```lux
fn computeKey(data: String): Int
    is pure
    is deterministic
    is total = {
    // Hash computation
    List.fold(String.chars(data), 0, fn(acc: Int, c: Char): Int =>
        acc * 31 + Char.code(c))
}
```

**Enabled optimizations**:
- Compile-time evaluation for constants
- Automatic memoization at runtime
- Parallel execution in batch operations
- No exception handling needed
- Safe to inline anywhere

### Idempotent + Commutative

```lux
fn setUnionItem<T>(set: Set<T>, item: T): Set<T>
    is idempotent
    is commutative = {
    Set.add(set, item)
}
```

**Enabled optimizations**:
- Parallel set building (order doesn't matter)
- Duplicate insertions are free (idempotent)
- Reorder insertions for cache locality

## Implementation Status

| Optimization | Status |
|--------------|--------|
| Pure: CSE | Planned |
| Pure: Dead code elimination | Partial (basic) |
| Pure: Auto-parallelization | Planned |
| Total: Exception elimination | Planned |
| Total: Aggressive inlining | Partial |
| Deterministic: Compile-time eval | Planned |
| Idempotent: Duplicate elimination | Planned |
| Commutative: Parallel reduction | Planned |

## Adding New Optimizations

When implementing new optimizations based on behavioral types:

1. **Verify the property is correct**: The optimization is only valid if the property holds
2. **Consider combinations**: Multiple properties together enable more optimizations
3. **Measure impact**: Profile before and after to ensure benefit
4. **Handle `assume`**: Functions using `assume` bypass verification but still enable optimizations (risk is on the programmer)

## Future Work

1. **Inter-procedural analysis**: Track properties across function boundaries
2. **Automatic property inference**: Derive properties when not explicitly stated
3. **Profile-guided optimization**: Use runtime data to decide when to apply optimizations
4. **LLVM integration**: Pass behavioral hints to LLVM for backend optimizations