Getting started
Architecture
The orchestrator, the queue, browser and cloud workers, and where privacy lives.
ashao is a thin coordination layer over a pool of independent workers. The server never holds a model in the hot path if a worker can do the job — it holds the queue, the stream, and the rules. Here is how the pieces fit.
The components
| Layer | Responsibility |
|---|---|
| Client | Collects the prompt, posts the job, and reads the SSE stream. Also where browser workers run, in a separate tab or session. |
| Orchestrator | Enqueues jobs, hands them to workers, and publishes tokens. Pure Redis coordination — no model weights. |
| Redis | Holds the per-tier queues, job hashes, message payloads, and the pub/sub channel each job streams on. |
| Workers | Browser (WebGPU) and native machines that claim jobs, run inference, and post tokens back. |
| Cloud fallback | A managed provider that self-serves a job only when no worker claims it within the grace window. |
The job lifecycle
A job is a small, serializable record. It moves through four states and is identified by a single id used for its hash, its messages, and its token channel.
export type Job = {
id: string;
userId: string | null;
tier: "pro" | "max";
model: string;
status: "queued" | "running" | "done" | "failed";
claimedBy: string | null;
createdAt: number;
};queued— written to Redis and pushed ontoashao:queue:{tier}.running— a worker claimed it;claimedByis set.done— the producer published adoneevent with the final token count.failed— the producer published an error; callers can retry.
Streaming and fallback
The streaming endpoint is the broker between producers and your browser. It subscribes to the job's token channel and, in parallel, polls the job hash for a claim. Browser workers are the preferred producer; cloud is the automatic safety net.
// Subscribe first so no token is missed.
sub.subscribe(rkey.jobStream(jobId));
// Watch for a worker claim during the grace window.
const claimed = await waitForClaim(jobId, env.workerGraceMs);
if (!claimed) {
// No worker took it — server self-serves and publishes on the same channel.
const provider = pickCloudProvider(tier); // gemini if available, else simulated
await provider.stream({
messages,
onToken: (t) => publishToken(jobId, t),
});
await publishDone(jobId, tokens);
}
// Either way, relay every channel event to the client as SSE.One channel, two producers
Whether a worker or the cloud fallback generates the answer, both publish to the sameashao:job:{id}:tokens channel. The client never needs to know which one it got.Where privacy lives
Privacy isn't a setting — it's a property of how jobs are shaped. Three rules hold across the system:
- Workers see text, not people. A claimed job carries the messages to continue and nothing else — no user id, wallet, or session is ever attached to the payload a worker receives.
- Prompts aren't retained on the network. Job messages live in Redis only long enough to be served, and the transient job keys are short-lived. Your own conversation history is yours, stored against your account.
- Identity is decoupled from inference. The accounting that pays a worker and the inference it performed are linked by job id, not by who you are.
Provider abstraction
Cloud and simulated providers implement one interface, so the orchestrator treats them identically. Adding a provider means implementing stream() — nothing upstream changes.
export interface InferenceProvider {
id: string;
available(): boolean;
stream(opts: StreamOpts): Promise<InferenceResult>;
}For the full request path from the user's side, see How it works. For the endpoint contracts, see the API reference.