All posts
Engineering

Inside the ashao orchestrator

How a single chat message becomes a job, finds a stranger's GPU, and streams back to you token by token — with a cloud safety net that you should almost never notice.

Dovid ReyesSystems lead
7 min read

The hardest part of a decentralized inference network is not running the model. It is the choreography: matching unpredictable demand to a fleet of volunteer machines that can appear and vanish mid-request, while the person waiting for an answer experiences none of that uncertainty. The orchestrator is the piece that makes the messy middle feel boring. This post walks the full path of a single message.

A message becomes a job

When you hit send, the client posts your conversation to /api/chat/send. The server does three things in quick succession: it persists the message, it deducts the tier's credit cost from your balance, and it enqueues a job. A job is a small record — an id, the tier you chose, the model, a status, and a nullable claimedBy field that starts empty. The conversation itself is written to a separate key so the job stays cheap to pass around.

That job id is pushed onto a per-tier Redis list — a queue — and the id is handed straight back to your browser. The whole round trip is a single write path. No model has run yet. Nothing has decided who will do the work. That decision is deliberately deferred.

Two tiers, two queues
Pro jobs (Browser workers · WebGPU) and Max jobs (Cloud + native workers) ride separate queues. Workers subscribe to the tier they are equipped for, so a phone running a tiny model never gets handed a job it cannot finish.

A worker claims it

Meanwhile, somewhere else entirely, a worker is asking the network for something to do. It calls a claim endpoint, which runs a blocking pop against the queue. The instant a job id is available, exactly one worker wins it — Redis guarantees the atomicity, so two workers can never pick up the same prompt. The orchestrator stamps the job with claimedBy and flips its status to running, then hands the worker the conversation to generate against.

From here the worker streams. Each chunk the model produces is published to a per-job channel — think of it as a private radio frequency named after the job id. The worker is not talking to your browser directly; it is broadcasting, and the orchestrator is listening.

The tokens find their way home

Your browser, having received the job id, immediately opened a Server-Sent Events stream to /api/chat/stream. The server subscribes to that job's token channel and relays every event down to you as it arrives — token, token, token, then a done marker carrying the final count. To you it looks like the answer is being typed. Under the hood, it is being relayed from a machine you will never meet.

The user opens one stream and watches an answer appear. Everything that makes that possible — the queue, the claim, the channel — stays below the waterline.

The grace period and the safety net

Volunteer networks have a failure mode: what if nobody claims the job? Maybe it is three in the morning and the fleet is thin. The orchestrator handles this with a grace period. After enqueuing, the streaming route watches the job for a short, configurable window to see if a worker stamps claimedBy.

  • If a worker claims it inside the window, the server relays that worker's tokens and stays out of the way.
  • If the window passes unclaimed, the server quietly serves the job itself — picking a cloud provider, or a deterministic simulated provider when no API key is configured — and publishes those tokens to the very same channel.

The elegance is that your browser cannot tell the difference. It is subscribed to one channel and the tokens arrive on it either way. Browser workers are preferred; the cloud is an automatic fallback, not a default. The network leans on its people first and only reaches for a data center when the people are asleep.

Why a queue and not a socket

We could have wired browsers straight to workers over peer connections. We chose a queue and a pub/sub channel because it makes the system legible. Every job has a durable record; every token has a place to land even if the listener reconnects; and the fallback path is a few lines, not a distributed-systems research project. A network that strangers run has to fail in plain, recoverable ways. The orchestrator's job is to be unglamorous, and it is very good at it.

ashao is live. Connect a wallet and the network is yours to use.

Open the chat

Keep reading