What is built, what is being built,
and what comes next.
Only capabilities backed by executable tests are documented as completed. Future directions are non-binding and subject to the same standard: observable behavior, reproducible tests, stability under reality.
| Version | What it closed | Key result |
|---|---|---|
| v13–v19 | Execution scaffolding — policy layer, guards, contracts, memory telemetry, sensor-to-decision pipeline | Live hardware feedback loop |
| M4–M4.6 | Safetensors loader, TinyLlama end-to-end, Llama-family expansion (4 models F64-validated) | 4,096×–9,692× closer to F64 truth than PyTorch BF16 |
| M4.7 | Beyond-VRAM execution — Llama 2 13B on 8 GB VRAM + 32 GB RAM + NVMe | [PASS] ✓ argmax bit-exact pre/post spill |
| M4.8 | AVX2+FMA matmul dispatcher — SIMD + matrixmultiply | 49.5× on production shape · 13B: 18.75 min → 5.38 min |
| M4.9 · M5 | Public CLI · tokenizer + KV cache + autoregressive generation | atenia run · atenia generate — coherent output |
| M6–M7 | Tier-aware GPU loader — VRAM → RAM → NVMe automatic placement | 1.46× on 7B · 13B on 32 GB without BSOD |
| M8 | BF16-resident VRAM kernels — double effective VRAM capacity | 1.31× on 7B · 1.36× on 13B · ADR-004 preserved |
| M8.7 | Disk → GPU JIT streaming pipeline — async NVMe read + PCIe upload + GPU matmul | 154 weights/forward · 98.7 % prefetch hit rate |
| Multi-family | Llama, Qwen, Gemma, Phi, Mistral, SmolLM and Falcon3 validated end-to-end | 7 families · safetensors + GGUF |
| Adapter Toolkit v2 | Declarative YAML adapter specs — describe a model without writing Rust | load · inspect · debug · v1 compatibility preserved |
| CLI | Human errors, stable exit codes, logging levels, host/model diagnostics, interactive chat | generate · chat · doctor · diagnose · capabilities |
M8 — BF16 VRAM kernels (detail)
Weights stored as BF16 in VRAM (2 bytes/element instead of 4) doubles effective VRAM capacity — 82 projection weights fit in 8 GB instead of 38. The first implementation failed F64 validation: truncating both weight and activation to BF16 cascaded drift across 40 layers. The fix — upcast weight BF16→F32 transient per-matmul, keep activation F32 — dropped drift from 2.33 to 7.31e-4 (3,190× improvement on SmolLM2).
Multi-family support (detail)
Each model family — Llama, Qwen, Gemma, Phi, Mistral, SmolLM, Falcon3 — lives in its own adapter module. Family-specific behaviour (Phi-3’s LongRoPE, Gemma 2’s dual-norm and soft-cap, Qwen’s QKV biases, GGUF’s fused-weight conventions) stays contained there; the execution core never learns which family it is running. Adding a new family is a contained change, not a core modification.
Adapter Toolkit v2 takes this one step further: a model can be described
in a small YAML file and validated with atenia load — no
Rust, no recompilation. Classic Falcon, mixture-of-experts and
multimodal models are explicitly out of scope and fail loud rather than
producing wrong output.
Try it yourself
git clone https://github.com/AteniaEngine/ateniaengine.git
cd ateniaengine
cargo install --path .
# Check your machine is ready
atenia doctor
# Download a small model and chat with it
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct \
--local-dir ./models/llama-3.2-1b-instruct
atenia chat --model ./models/llama-3.2-1b-instruct
- v21 Production execution guards. Adaptive memory-pressure thresholds calibrated against real workload envelopes. Verdict stability under noisy signals. Structured logging and replay harnesses.
- v22 Multi-vendor backend foundation. Vendor-neutral abstraction for hardware probes and kernel compilation. First target: NVIDIA discrete + Intel iGPU coexistence — a common laptop configuration other runtimes treat as single-backend.
- v23 AMD ROCm backend. Substantial differences in driver model, memory management, and sync primitives — scoped as its own milestone.
- v24 Apple Metal backend. Unified memory model, Metal Shading Language, Xcode-centric toolchain.
- v25 Distributed execution. Multi-host execution reasoning. Out of scope until single-host execution is mature across vendors.
These are not missing features. They are deliberate boundaries.
- Embedding machine learning into the execution control path
- Modifying model semantics, numerical results, or training dynamics
- Competing as a replacement for major ML frameworks
- Performance-at-all-costs optimization that sacrifices stability
- Opaque, black-box adaptation mechanisms
- Mixture-of-experts, multimodal and encoder-decoder architectures
The capabilities listed as completed are backed by executable tests in the public repository. Any future direction must meet the same standard: observable execution behavior, reproducible tests, and stability under reality.