Getting started

How it works

Follow a single prompt from your browser to a worker and back, token by token.

Every ashao answer is the result of a small relay race: your browser hands a job to the network, a worker claims it, and tokens stream back to you as they are produced. Here is the whole trip.

1. You send a message

When you submit a prompt, the chat client posts it to /api/chat/send along with the tier you chose (Pro or Max). The server persists the conversation, deducts the tier's credit cost if you're signed in, and enqueues a job onto the network. It returns a jobId immediately — before a single token has been generated.

send a message
const res = await fetch("/api/chat/send", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: JSON.stringify({ tier: "pro", content, conversationId }),
});

const { jobId, conversationId } = await res.json();

2. The job joins the queue

The orchestrator writes the job to Redis: a hash describing the job, the messages stored as JSON, and the job id pushed onto the queue for its tier. Browser and native workers are long-polling that queue, waiting to claim work.

orchestrator
// HSET ashao:job:{id}            -> job metadata
// SET  ashao:job:{id}:messages   -> ChatMessage[] as JSON
// LPUSH ashao:queue:{tier}       -> id (workers BRPOP this)
await enqueueJob(job, messages);

3. A worker claims it

A worker calls claimNextJob(tier, workerId), which blocking-pops the queue, stamps the job as running and records claimedBy. The worker pulls the messages, runs them through its local model, and posts each chunk back to the network as it's decoded.

worker loop (browser)
const { job, messages } = await claim(workerId, "pro");
for await (const chunk of engine.chat(messages)) {
  await post("/api/worker/token", { jobId: job.id, value: chunk });
}
await post("/api/worker/complete", { jobId: job.id, workerId, tokens });

4. Tokens stream back to you

The moment you got a jobId, the client opened a Server-Sent Events connection to /api/chat/stream. That endpoint subscribes to the job's token channel and relays every token, then a done event, to your browser as it arrives.

receive the stream
const es = new EventSource(`/api/chat/stream?jobId=${jobId}`);
es.onmessage = (e) => {
  const evt = JSON.parse(e.data); // { type: "token" | "done" | "error", ... }
  if (evt.type === "token") append(evt.value);
  if (evt.type === "done") es.close();
};

What if no worker is around?

Browser workers are preferred, but the network never leaves you waiting. The streaming endpoint watches the job for a claim for a short grace window. If no worker takes it in time, the server self-serves the job through a cloud provider and publishes the tokens on the same channel — so from your side, the experience is identical.

  • Claimed in time → tokens are relayed straight from the worker that produced them.
  • Grace window passes → the orchestrator falls back to a cloud provider, or a built-in simulated provider if none is configured, so a response always streams.

Why SSE and not a websocket?

A single response is a one-way stream of tokens from server to client. Server-Sent Events give us automatic reconnection and a dead-simple protocol with none of the bidirectional overhead a socket would add.

Keep reading

The Architecture page goes deeper on the orchestrator and where privacy is enforced, and Run a worker walks through being on the other side of this relay.