Run models on hardware
where other engines give up.
Atenia Engine is a from-scratch LLM inference runtime written in Rust. It runs Llama 2 13B Chat on a laptop with 8 GB of VRAM and 32 GB of RAM — spreading the model automatically across VRAM, RAM, and NVMe so it simply fits, with nothing for you to configure.
Seven model families. One interactive command. Every answer checked against exact mathematics.
Try it
Talk to a model in one command.
Download any supported checkpoint, point Atenia at the folder, and start chatting. No Python environment, no inference server, no configuration — just the model and a prompt.
atenia chat --model ./models/llama-3.2-1b-instruct
An interactive, multi-turn chat in your terminal. Prefer a single
answer? Use atenia generate --prompt "...". Not sure your
machine is ready? Run atenia doctor first — it checks your
CPU, RAM, GPU and build in one line.
Where the project stands
| Milestone | What it delivered | Result |
|---|---|---|
| M4.7 | Beyond-VRAM execution — Llama 2 13B on 8 GB VRAM | [PASS] ✓ bit-exact pre/post spill |
| M5 | Tokenizer + KV cache + text generation | atenia generate — coherent chat output |
| M6–M8 | Tier-aware loader + BF16-resident VRAM kernels | 1.31×–1.46× faster, bit-identical output |
| M8.7 | Disk → GPU JIT streaming pipeline | 154 weights/forward · 98.7 % prefetch hit rate |
| Multi-family | Llama, Qwen, Gemma, Phi, Mistral, SmolLM, Falcon3 | 7 families validated |
| Adapter Toolkit v2 | Describe a model in a small YAML file — no Rust required | load · inspect · debug |
| CLI | Human errors, logging, diagnostics, interactive chat | generate · chat · doctor · diagnose |
The demo that started it
Llama 2 13B Chat on hardware where it barely fits.
The model is 26 GB. The VRAM is 8 GB. The RAM is 32 GB. Most engines stop here. Atenia moves parameters between tiers as execution proceeds — no intervention required.
git clone https://github.com/AteniaEngine/ateniaengine.git
cd ateniaengine
cargo install --path .
huggingface-cli download meta-llama/Llama-2-13b-chat-hf \
--local-dir ./models/llama-2-13b-chat
atenia run --mode c \
--model ./models/llama-2-13b-chat \
--cache-dir ./atenia-cache
Expected output: [PASS] ✓ argmax(pre-spill) == argmax(post-spill) bit-exactly
in approximately 7 minutes on a 32 GB / 8 GB box.
The models you already use
Bring your own checkpoint — Atenia probably already runs it.
Seven model families have been validated end-to-end: they load, they generate coherent text, and they stop cleanly. Both HuggingFace safetensors and quantised GGUF files work through the same path.
| Family | Examples | Status |
|---|---|---|
| Llama | TinyLlama · Llama 2 · Llama 3.1 / 3.2 | validated |
| Qwen | Qwen 2.5 · Qwen3 | validated |
| Gemma | Gemma 2 · Gemma 3 (text) | validated |
| Phi | Phi-3 · Phi-3.5 · Phi-4 mini | validated |
| Mistral | Mistral 7B v0.2 / v0.3 | validated |
| SmolLM | SmolLM · SmolLM2 | validated |
| Falcon3 | Falcon3 1B / 3B / 7B | validated |
Not sure a specific model will work? atenia diagnose --model ./your-model
checks it in seconds, before you ever load it.
Mathematically correct by design
Validated against exact math — not against other frameworks.
Most inference engines run in BF16 by default. It is fast and memory-efficient — and it drifts from the true answer in ways that are rarely measured. Atenia measures every certified model against an F64 (double-precision) reference and publishes the result with the model.
Same checkpoints, same weights, same inputs. Every number below is
reproducible on your own machine with one cargo test command.
| Model | Atenia F32 | PyTorch BF16 | |
|---|---|---|---|
| TinyLlama 1.1B | 0.000141 | 0.732 | 5,198× worse |
| SmolLM2 1.7B | 0.001446 | 14.01 | 9,692× worse |
| Qwen 2.5 1.5B | 0.000346 | 1.531 | 4,420× worse |
| Llama 3.2 1B | 0.000132 | 0.539 | 4,096× worse |
Lower is closer to the true answer. Each certified checkpoint ships a
signed numcert.json that records its measured drift — no
other inference runtime publishes this.
Nothing here is a promise
Every claim on this page is backed by a test you can run.
The repository ships a large suite of executable tests covering tensor math, model loading, generation, numeric drift and the CLI itself — not benchmarks on ideal hardware, not cherry-picked prompts. Clone the repo and check for yourself:
cargo test --lib
What Atenia Engine is not
- Not a machine learning framework
- Not a compiler or graph optimizer
- Not a performance-at-all-costs system
- Not a heuristic tuning layer
Atenia does not modify model semantics. It makes smarter decisions about where parameters live and when they move — and keeps those decisions visible and auditable.
References
Project, research and authorship
Filed December 16, 2025
GAAIA LABS