Getting started
How it works
Follow a single prompt from your browser to a worker and back, token by token.
Every ashao answer is the result of a small relay race: your browser hands a job to the network, a worker claims it, and tokens stream back to you as they are produced. Here is the whole trip.
1. You send a message
When you submit a prompt, the chat client posts it to /api/chat/send along with the tier you chose (Pro or Max). The server persists the conversation, deducts the tier's credit cost if you're signed in, and enqueues a job onto the network. It returns a jobId immediately — before a single token has been generated.
const res = await fetch("/api/chat/send", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ tier: "pro", content, conversationId }),
});
const { jobId, conversationId } = await res.json();2. The job joins the queue
The orchestrator writes the job to Redis: a hash describing the job, the messages stored as JSON, and the job id pushed onto the queue for its tier. Browser and native workers are long-polling that queue, waiting to claim work.
// HSET ashao:job:{id} -> job metadata
// SET ashao:job:{id}:messages -> ChatMessage[] as JSON
// LPUSH ashao:queue:{tier} -> id (workers BRPOP this)
await enqueueJob(job, messages);3. A worker claims it
A worker calls claimNextJob(tier, workerId), which blocking-pops the queue, stamps the job as running and records claimedBy. The worker pulls the messages, runs them through its local model, and posts each chunk back to the network as it's decoded.
const { job, messages } = await claim(workerId, "pro");
for await (const chunk of engine.chat(messages)) {
await post("/api/worker/token", { jobId: job.id, value: chunk });
}
await post("/api/worker/complete", { jobId: job.id, workerId, tokens });4. Tokens stream back to you
The moment you got a jobId, the client opened a Server-Sent Events connection to /api/chat/stream. That endpoint subscribes to the job's token channel and relays every token, then a done event, to your browser as it arrives.
const es = new EventSource(`/api/chat/stream?jobId=${jobId}`);
es.onmessage = (e) => {
const evt = JSON.parse(e.data); // { type: "token" | "done" | "error", ... }
if (evt.type === "token") append(evt.value);
if (evt.type === "done") es.close();
};What if no worker is around?
Browser workers are preferred, but the network never leaves you waiting. The streaming endpoint watches the job for a claim for a short grace window. If no worker takes it in time, the server self-serves the job through a cloud provider and publishes the tokens on the same channel — so from your side, the experience is identical.
- Claimed in time → tokens are relayed straight from the worker that produced them.
- Grace window passes → the orchestrator falls back to a cloud provider, or a built-in simulated provider if none is configured, so a response always streams.
Why SSE and not a websocket?
A single response is a one-way stream of tokens from server to client. Server-Sent Events give us automatic reconnection and a dead-simple protocol with none of the bidirectional overhead a socket would add.Keep reading
The Architecture page goes deeper on the orchestrator and where privacy is enforced, and Run a worker walks through being on the other side of this relay.