EMPERO/00 — INDEXINDEPENDENT AI RESEARCH LAB · OPEN BY DEFAULT · BUILT IN GERMANY2026.06.24

Independent AI research lab

Small models,
trained in the open.

An independent AI research lab. We build small, efficient language models you can own and run yourself — and release the weights, code and datasets in the open. This release wave ships Abacus, our Rust terminal coding agent; refreshes the Qwythos GGUFs with v2 runtime fixes and MTP variants; and announces Qwythos-27B as the next larger Mythos model. Claire remains our in-house language model in training.

See the models →Abacus on GitHub ↗The research →

Abacus

Coding agent

released today · Rust TUI

GGUF v2

Qwythos local builds

fixed templates · MTP · vision

27B

Qwythos announced

larger Mythos tier

100%

Open by default

weights · code · datasets

EMPERO — INDEPENDENT AI RESEARCH LAB✱ABACUS · CODING AGENT · RELEASED TODAY✱QWYTHOS-9B · CLAUDE MYTHOS 5 DISTILL · 1M CONTEXT✱QWYTHOS GGUF V2 · FIXED TEMPLATES · MTP VARIANTS · VISION✱QWYTHOS-27B · ANNOUNCED✱CLAIRE · IN-HOUSE MODEL · 6B-A500M · IN TRAINING✱MICROVERSE · AUTOMATED ARCHITECTURE DISCOVERY✱OPEN WEIGHTS · OPEN CODE · OPEN DATASETS✱EMPERO — INDEPENDENT AI RESEARCH LAB✱ABACUS · CODING AGENT · RELEASED TODAY✱QWYTHOS-9B · CLAUDE MYTHOS 5 DISTILL · 1M CONTEXT✱QWYTHOS GGUF V2 · FIXED TEMPLATES · MTP VARIANTS · VISION✱QWYTHOS-27B · ANNOUNCED✱CLAIRE · IN-HOUSE MODEL · 6B-A500M · IN TRAINING✱MICROVERSE · AUTOMATED ARCHITECTURE DISCOVERY✱OPEN WEIGHTS · OPEN CODE · OPEN DATASETS✱

Now shipping

Abacus, Qwythos GGUF v2, and Qwythos-27B.

June 22, 2026

ABACUS · RELEASED TODAY

Abacus is the coding agent.

A fast, local-first terminal agent in Rust for setup, search, edits, review, sessions and scripting. Bring your own model endpoint; every mutation is approval-gated and shown first as a per-file diff.

Rust · local/open-weight friendlyOpen on GitHub ↗

QWYTHOS GGUF · V2

Redownload the Qwythos GGUFs.

v2 replaces the original normal files, fixes tokenizer and embedded chat/tool-template metadata for Qwen3.5 GGUF runtimes, adds -MTP- variants, and smoke-tests Q4/Q8 tool calling, 1M context and vision.

Q4_K_M through BF16 · MTP + mmprojOpen GGUF v2 ↗

QWYTHOS-27B · ANNOUNCED

The next Qwythos size is on deck.

Qwythos-27B is announced as the larger next member of the Mythos line. The shipped 9B remains the current open-weight flagship while 27B moves the family toward a heavier local-reasoning tier.

Announced · details to followFollow on HF ↗

01Flagship · Qwythos-9BQWEN3.5-9B · GGUF V2 · 1M CONTEXT · APACHE-2.0

CURRENT FLAGSHIP · GGUF V2 LIVE · QWYTHOS-27B ANNOUNCED

A 9B that checks
its own work.

Qwythos-9B-Claude-Mythos-5-1M is our shipped flagship — a full-parameter reasoning model distilled from Claude Mythos 5 and Claude Fable traces, with a 1M-token context, native tool use, and a +34.3 MMLU lift over its base. The v2 GGUFs are live now: fixed Qwen3.5 runtime metadata, stronger embedded chat/tool templates, normal replacement files, explicit -MTP- variants for draft speculation, and a vision projector. Qwythos-27B is announced as the next larger Mythos model.

Model card on HF ↗GGUF v2 for llama.cpp / Ollama ↗Full eval transcripts ↗

+34.3

MMLU vs base

0.232 → 0.575

token context

YaRN · default

GGUF refresh

templates · MTP · vision

27B

announced next

larger Mythos tier

Matched lm-evaluation-harness run (limit 100). Not every benchmark moved up — gpqa-diamond −0.05, arc-challenge roughly flat. Full table on the model card.

Reason → tool → verify · real transcripts, condensedfrom the published eval set

PROMPT

How many prime numbers are there below 100,000? Verify with python.

TOOL CALLS

›python_executor → wrote and ran a primality test → stdout: 9592

<THINK> · REASONING · 2 rounds · 10.5s

This is a counting problem over a fixed range — the kind of thing I should verify rather than recall. I'll write a primality test, count every prime from 2 through 99,999, and run it in the python_executor rather than trusting a memorised figure.

FINAL ANSWER

There are 9,592 prime numbers below 100,000.

I verified this by computing all primes from 2 through 99,999 using a primality test and counting them.

→ Reaches for a tool to verify rather than recalling a number. Writes the primality test, runs it, reports the checked result.

02Published models6 ON HUGGING FACE · OPEN WEIGHTS

FIG. 02 — Qwythos-9B-Claude-Mythos-5-1MApache-2.0

Qwythos-9B-Claude-Mythos-5-1M

Our shipped flagship. A full-parameter reasoning model on a deeply uncensored Qwen3.5-9B base, post-trained on 500M+ tokens of Claude Mythos and Claude Fable traces with in-house chain-of-thought. Ships with a 1,048,576-token (1M) context via YaRN by default and native function calling — and self-corrects with tools (7/7 on hard factual prompts spanning math, cybersecurity, pharmacology and biochem). Versus the base, under a matched harness: +34.3 MMLU, +30 gsm8k-strict, +19 gsm8k-flex. GGUF v2 adds fixed runtime metadata, MTP variants and vision-projector support; Qwythos-27B is announced as the next larger Mythos member.

Benchmarks

MMLU

57.5

GSM8K (strict)

GSM8K (flex)

Open model card →

03Research systemsDISCOVER → DATA → MODEL

We build the whole stack ourselves — the harness that finds better architectures, the pipeline that builds the training data, and the model they produce.

STEP 01ARCHITECTURE DISCOVERY

microverse

Internal · in use

An automated, LLM-in-the-loop harness for discovering novel attention mechanisms and transformer blocks. It proposes candidate architectures, sandbox-trains each one, benchmarks them against a synthetic gauntlet plus a mini language-model tier, and ranks the ideas worth promoting to real scale.

feeds→

STEP 02DATA PIPELINE

SFTSuite

Internal · in use

Our supervised-fine-tuning data factory. It generates conversation traces with a teacher model and assembles them into a staged, curriculum-by-position corpus — short and simple first, long and multi-turn last — validated against Claire's real tokenizer before training.

feeds→

STEP 03LANGUAGE MODEL

Claire

Pretraining · 6B-A500M

Our in-house language model, trained from scratch on a custom architecture and training recipe. A 6B mixture-of-experts with roughly 500M parameters active per token (6B-A500M) — frontier-style sparsity at a size you can actually own and run, built for reasoning, code and grounded tool use.

output◆

04Open Source8 PUBLIC REPOS · GITHUB

01 / 08TERMINAL · CODING AGENT · Rust

abacus

A fast, local-first terminal coding agent written in Rust. Bring your own OpenAI-compatible endpoint — local Ollama, llama.cpp, vLLM or hosted providers — then inspect approval-gated diffs, sessions, goals, skills, MCP tools, subagents and scheduled jobs from one focused TUI.

released today · Apache-2.0↗

02 / 08CHAT · MULTI-MODEL · Python

fusionchat

Multi-model fusion chat: a master model asks up to three independent fusion models the same task, then synthesizes one coherent answer from their replies. TUI plus a web UI, every turn logged.

public · Python↗

03 / 08TRAINING · DASHBOARD · Python

runmonitor

A lean, local experiment tracker with a live web dashboard. Import and go — loss curves, arbitrary metrics, artifact saving and run comparison update in real time. No servers, no API keys, no cloud.

public · MIT↗

04 / 08TRAINING · STORAGE · Python

offside-checkpoints

Automatically offload PyTorch checkpoints to (S)FTP as they are written, then delete the local copy so long runs never fill the disk. Resume by run name and checkpoint name.

public · MIT↗

05 / 08DATASETS · SFT · Rust

taskgen

A fast, concurrent task generator for distillation data, written in native Rust. Dozens of domains across math, code, science, creative writing and conversation. OpenAI-compatible; builds the public tasklist-* corpora.

active · Rust↗

06 / 08INFRA · PROXY · Python

claude-code-proxy

A high-fidelity proxy that translates the Anthropic Messages API to any OpenAI-compatible backend: extended thinking, document blocks, streaming tool use and cache control preserved.

v0.1 · MIT↗

07 / 08RESEARCH · Python

nanoAttnRes

A nanochat fork with Block Attention Residuals: learned, input-dependent attention over previous block outputs in place of fixed additive residuals. A small, readable testbed for an architectural idea.

experimental · MIT↗

08 / 08TRIBUTE · Python

TEMPLE2

A tiny ~63M GPT-2 trained from scratch on public-domain scripture — a spare-compute tribute to Terry A. Davis. Not a serious model. That is the point.

side project · MIT↗

All projects →

05MissionWHAT WE BUILD AND WHY

Efficient models, from scratch

Claire is our clean-sheet language model — a custom architecture and training recipe, built as a 6B mixture-of-experts with only ~500M parameters active per token. We are convinced a carefully designed sparse model punches well above its active-parameter weight: efficient enough to own and run yourself, capable enough to reason, write code and call tools.

Automated architecture discovery

microverse is our LLM-in-the-loop harness for finding better building blocks. It proposes attention mechanisms and transformer blocks, sandbox-trains each against a synthetic gauntlet, and surfaces the few structural ideas worth promoting to a real training run — before we spend the compute.

Data as curriculum

SFTSuite and taskgen treat data as a first-class part of the model. Conversation traces are generated, validated against the real tokenizer, and ordered so that position in the corpus is the curriculum — simple and short first, long and multi-turn last. The raw datasets are published openly.

Open by default

Weights, code, datasets and tools ship in the open: Abacus in your terminal, runmonitor for live training, offside-checkpoints for storage, plus our published models on Hugging Face. No waitlists. Built because we needed them; open because someone else might too.

06Writing1 POSTS · UPDATED 19.06.26

★ FEATURED19.06.26 · 7 min

Qwythos-9B: a 9B that checks its own work

Our biggest open-weights release yet — a full-parameter reasoning model distilled from Claude Mythos 5, with a 1M-token context, native tool use, and a +34-point MMLU jump over its base. Here's what's in it, the honest benchmark table, and how to run it.

kodee · ResearchRead →

All writing →

06 — Dispatch

Follow the build.

An occasional dispatch from the lab — progress on Qwythos and Claire, what we found with microverse, new Abacus releases and the one thing we got wrong that week. No hype, no roadmap theatre. Cancel from any line.

Small models,trained in the open.

Abacus, Qwythos GGUF v2, and Qwythos-27B.

Abacus is the coding agent.

Redownload the Qwythos GGUFs.

The next Qwythos size is on deck.

A 9B that checksits own work.

Efficient models, from scratch

Automated architecture discovery

Data as curriculum

Open by default

Qwythos-9B: a 9B that checks its own work

Follow the build.

Small models,
trained in the open.

A 9B that checks
its own work.