emperoresearch lab
Open weights01/06
EMPERO/00 — INDEXINDEPENDENT AI RESEARCH LAB · OPEN BY DEFAULT · BUILT IN GERMANY2026.06.24
Independent AI research lab

Small models,
trained in the open.

An independent AI research lab. We build small, efficient language models you can own and run yourself — and release the weights, code and datasets in the open. This release wave ships Abacus, our Rust terminal coding agent; refreshes the Qwythos GGUFs with v2 runtime fixes and MTP variants; and announces Qwythos-27B as the next larger Mythos model. Claire remains our in-house language model in training.

Abacus
Coding agent
released today · Rust TUI
GGUF v2
Qwythos local builds
fixed templates · MTP · vision
27B
Qwythos announced
larger Mythos tier
100%
Open by default
weights · code · datasets
EMPERO — INDEPENDENT AI RESEARCH LABABACUS · CODING AGENT · RELEASED TODAYQWYTHOS-9B · CLAUDE MYTHOS 5 DISTILL · 1M CONTEXTQWYTHOS GGUF V2 · FIXED TEMPLATES · MTP VARIANTS · VISIONQWYTHOS-27B · ANNOUNCEDCLAIRE · IN-HOUSE MODEL · 6B-A500M · IN TRAININGMICROVERSE · AUTOMATED ARCHITECTURE DISCOVERYOPEN WEIGHTS · OPEN CODE · OPEN DATASETSEMPERO — INDEPENDENT AI RESEARCH LABABACUS · CODING AGENT · RELEASED TODAYQWYTHOS-9B · CLAUDE MYTHOS 5 DISTILL · 1M CONTEXTQWYTHOS GGUF V2 · FIXED TEMPLATES · MTP VARIANTS · VISIONQWYTHOS-27B · ANNOUNCEDCLAIRE · IN-HOUSE MODEL · 6B-A500M · IN TRAININGMICROVERSE · AUTOMATED ARCHITECTURE DISCOVERYOPEN WEIGHTS · OPEN CODE · OPEN DATASETS
01Flagship · Qwythos-9BQWEN3.5-9B · GGUF V2 · 1M CONTEXT · APACHE-2.0
CURRENT FLAGSHIP · GGUF V2 LIVE · QWYTHOS-27B ANNOUNCED

A 9B that checks
its own work.

Qwythos-9B-Claude-Mythos-5-1M is our shipped flagship — a full-parameter reasoning model distilled from Claude Mythos 5 and Claude Fable traces, with a 1M-token context, native tool use, and a +34.3 MMLU lift over its base. The v2 GGUFs are live now: fixed Qwen3.5 runtime metadata, stronger embedded chat/tool templates, normal replacement files, explicit -MTP- variants for draft speculation, and a vision projector. Qwythos-27B is announced as the next larger Mythos model.

Model card on HF GGUF v2 for llama.cpp / Ollama Full eval transcripts
+34.3
MMLU vs base
0.232 → 0.575
1M
token context
YaRN · default
v2
GGUF refresh
templates · MTP · vision
27B
announced next
larger Mythos tier

Matched lm-evaluation-harness run (limit 100). Not every benchmark moved up — gpqa-diamond −0.05, arc-challenge roughly flat. Full table on the model card.

Reason → tool → verify · real transcripts, condensedfrom the published eval set
PROMPT
How many prime numbers are there below 100,000? Verify with python.
TOOL CALLS
python_executor → wrote and ran a primality test → stdout: 9592
<THINK> · REASONING · 2 rounds · 10.5s
This is a counting problem over a fixed range — the kind of thing I should verify rather than recall. I'll write a primality test, count every prime from 2 through 99,999, and run it in the python_executor rather than trusting a memorised figure.
FINAL ANSWER
There are 9,592 prime numbers below 100,000.

I verified this by computing all primes from 2 through 99,999 using a primality test and counting them.

Reaches for a tool to verify rather than recalling a number. Writes the primality test, runs it, reports the checked result.

02Published models6 ON HUGGING FACE · OPEN WEIGHTS
FIG. 02 — Qwythos-9B-Claude-Mythos-5-1MApache-2.0
Qwythos-9B-Claude-Mythos-5-1M

Our shipped flagship. A full-parameter reasoning model on a deeply uncensored Qwen3.5-9B base, post-trained on 500M+ tokens of Claude Mythos and Claude Fable traces with in-house chain-of-thought. Ships with a 1,048,576-token (1M) context via YaRN by default and native function calling — and self-corrects with tools (7/7 on hard factual prompts spanning math, cybersecurity, pharmacology and biochem). Versus the base, under a matched harness: +34.3 MMLU, +30 gsm8k-strict, +19 gsm8k-flex. GGUF v2 adds fixed runtime metadata, MTP variants and vision-projector support; Qwythos-27B is announced as the next larger Mythos member.

Benchmarks
MMLU
57.5
GSM8K (strict)
81
GSM8K (flex)
86
03Research systemsDISCOVER → DATA → MODEL

We build the whole stack ourselves — the harness that finds better architectures, the pipeline that builds the training data, and the model they produce.

STEP 01ARCHITECTURE DISCOVERY
microverse
Internal · in use

An automated, LLM-in-the-loop harness for discovering novel attention mechanisms and transformer blocks. It proposes candidate architectures, sandbox-trains each one, benchmarks them against a synthetic gauntlet plus a mini language-model tier, and ranks the ideas worth promoting to real scale.

feeds
STEP 02DATA PIPELINE
SFTSuite
Internal · in use

Our supervised-fine-tuning data factory. It generates conversation traces with a teacher model and assembles them into a staged, curriculum-by-position corpus — short and simple first, long and multi-turn last — validated against Claire's real tokenizer before training.

feeds
STEP 03LANGUAGE MODEL
Claire
Pretraining · 6B-A500M

Our in-house language model, trained from scratch on a custom architecture and training recipe. A 6B mixture-of-experts with roughly 500M parameters active per token (6B-A500M) — frontier-style sparsity at a size you can actually own and run, built for reasoning, code and grounded tool use.

output
04Open Source8 PUBLIC REPOS · GITHUB
01 / 08TERMINAL · CODING AGENT · Rust
abacus

A fast, local-first terminal coding agent written in Rust. Bring your own OpenAI-compatible endpoint — local Ollama, llama.cpp, vLLM or hosted providers — then inspect approval-gated diffs, sessions, goals, skills, MCP tools, subagents and scheduled jobs from one focused TUI.

released today · Apache-2.0
02 / 08CHAT · MULTI-MODEL · Python
fusionchat

Multi-model fusion chat: a master model asks up to three independent fusion models the same task, then synthesizes one coherent answer from their replies. TUI plus a web UI, every turn logged.

public · Python
03 / 08TRAINING · DASHBOARD · Python
runmonitor

A lean, local experiment tracker with a live web dashboard. Import and go — loss curves, arbitrary metrics, artifact saving and run comparison update in real time. No servers, no API keys, no cloud.

public · MIT
04 / 08TRAINING · STORAGE · Python
offside-checkpoints

Automatically offload PyTorch checkpoints to (S)FTP as they are written, then delete the local copy so long runs never fill the disk. Resume by run name and checkpoint name.

public · MIT
05 / 08DATASETS · SFT · Rust
taskgen

A fast, concurrent task generator for distillation data, written in native Rust. Dozens of domains across math, code, science, creative writing and conversation. OpenAI-compatible; builds the public tasklist-* corpora.

active · Rust
06 / 08INFRA · PROXY · Python
claude-code-proxy

A high-fidelity proxy that translates the Anthropic Messages API to any OpenAI-compatible backend: extended thinking, document blocks, streaming tool use and cache control preserved.

v0.1 · MIT
07 / 08RESEARCH · Python
nanoAttnRes

A nanochat fork with Block Attention Residuals: learned, input-dependent attention over previous block outputs in place of fixed additive residuals. A small, readable testbed for an architectural idea.

experimental · MIT
08 / 08TRIBUTE · Python
TEMPLE2

A tiny ~63M GPT-2 trained from scratch on public-domain scripture — a spare-compute tribute to Terry A. Davis. Not a serious model. That is the point.

side project · MIT
All projects
05MissionWHAT WE BUILD AND WHY
01

Efficient models, from scratch

Claire is our clean-sheet language model — a custom architecture and training recipe, built as a 6B mixture-of-experts with only ~500M parameters active per token. We are convinced a carefully designed sparse model punches well above its active-parameter weight: efficient enough to own and run yourself, capable enough to reason, write code and call tools.

02

Automated architecture discovery

microverse is our LLM-in-the-loop harness for finding better building blocks. It proposes attention mechanisms and transformer blocks, sandbox-trains each against a synthetic gauntlet, and surfaces the few structural ideas worth promoting to a real training run — before we spend the compute.

03

Data as curriculum

SFTSuite and taskgen treat data as a first-class part of the model. Conversation traces are generated, validated against the real tokenizer, and ordered so that position in the corpus is the curriculum — simple and short first, long and multi-turn last. The raw datasets are published openly.

04

Open by default

Weights, code, datasets and tools ship in the open: Abacus in your terminal, runmonitor for live training, offside-checkpoints for storage, plus our published models on Hugging Face. No waitlists. Built because we needed them; open because someone else might too.

06 — Dispatch

Follow the build.

An occasional dispatch from the lab — progress on Qwythos and Claire, what we found with microverse, new Abacus releases and the one thing we got wrong that week. No hype, no roadmap theatre. Cancel from any line.

22 readers · we never share addresses