Signchat
    • Signchat Web
    • Signchat Bridge
    • ASL Classifier
    • Sign Pipeline
    • Architecture
  • Enterprise
  • Education
  • Pricing
Start a call
Sign Pipeline

From sign tokens to fluent voice — under a second.

The Signchat pipeline is seven stages, three packages, and zero relay servers. Camera frames stay on the device; tokens and audio go directly from the browser to OpenRouter and ElevenLabs. Target: sign-end to first audible byte at p50 ~0.6 s, p95 ~0.9 s.

Try a callRead the source
3 packages7 stages0 relays
The seven stages

How a sign becomes a sentence becomes a voice.

Each stage is a small file you can read in one sitting. The whole turn fits inside a single browser tab — there is no Signchat backend on the per-turn path.

  1. 1

    Camera → MediaPipe landmarks

    The signer's webcam stream is sent to MediaPipe Tasks Vision (face, gesture, pose) and reduced to per-frame landmarks. Loaders live in the small @signchat/sign-pipeline package; the per-frame runner is in @signchat/runtime-browser.

    packages/sign-pipeline/src/mediapipe.ts·packages/runtime-browser/src/sign-pipeline/mediapipe-runner.ts
  2. 2

    Landmarks → ONNX classifier → top-K

    A 48-frame ring of landmarks is fed into the ONNX classifier every ~500 ms. Softmax over 250 classes; two known-noisy labels are zeroed; the top-3 are emitted as ClassifierResults.

    packages/sign-pipeline/src/onnx.ts·packages/runtime-browser/src/sign-pipeline/mediapipe-onnx-classifier.ts
  3. 3

    Token admission (stable / band)

    Recognized labels only enter the SignBuffer if they're either consistently top-1 across STABILITY_TICKS ticks (stable) or top-1 with a credible top-2 contender (band). This is the dropout filter against jittery single-tick predictions.

    packages/sign-pipeline/src/admit.ts·packages/runtime-browser/src/mode-controller/mode-controller.ts
  4. 4

    Sentence reconstruction via OpenRouter

    When the signer pauses, the buffered tokens plus recent hearing captions become a structured prompt. The browser POSTs directly to OpenRouter — no Vercel relay — with a JSON-schema response format constraining the model to { sentence, confidence, … }.

    packages/prompts/src/build-request.ts·packages/runtime-browser/src/openrouter/client.ts
  5. 5

    JSON parse + schema validation

    The model's response is parsed and validated against a Zod schema. Any malformed payload throws a ReconstructionParseError — there's no "fallback voice"; an error stays loud so the signer knows the turn failed.

    packages/prompts/src/parse-response.ts
  6. 6

    Auto / proofread review

    The reconstructed sentence enters the mode controller's preview state. In auto mode it advances to speaking after a configurable silence; in proofread mode the signer must explicitly Approve, Edit, Re-sign, or Discard.

    packages/runtime-browser/src/mode-controller/mode-controller.ts
  7. 7

    ElevenLabs streaming TTS over WSS

    The approved sentence is streamed sentence-at-a-time to ElevenLabs Flash v2.5 over a WebSocket. Returned 24 kHz PCM is decoded and mixed with the user's mic in Web Audio and published as a single LiveKit signchat-voice track that the hearing peer subscribes to like any other call audio.

    packages/runtime-browser/src/elevenlabs/streaming.ts·ARCHITECTURE.md §8 — signchat-voice
Where each stage lives

Three packages, one product.

The pipeline is split across three workspace packages so the heavy bits (network, audio, FSM) can be reused by the Bridge Electron app without dragging the web UI along.

@signchat/sign-pipeline

Vision + ONNX + admit

Tiny shared package for loading MediaPipe and onnxruntime-web, fetching the vocabulary, and the pure admitToken function used by the production mode controller.

@signchat/prompts

Prompts and parsing

The frozen LEAN_OPTIONS_SYSTEM prompt, the request builder that formats top-K tokens and dialog history, and the Zod-backed response parser.

@signchat/runtime-browser

Network + audio + FSM

OpenRouter HTTP client, ElevenLabs WSS streaming, the mode-controller finite state machine, the sign-classifier orchestration, and the Web Audio graph that publishes the signchat-voice track.

Browser-direct, by design

No relay, no WSS gateway, no server-side TTS.

“LiveKit Cloud, OpenRouter, and ElevenLabs handle every per-turn data path. Vercel mints credentials — it’s not on the per-turn path.”

— ARCHITECTURE.md §1

No Signchat relay No WSS gateway No server-side TTS
Measured latency

Reconstruction latency, by model.

Numbers from the latest in-repo sweep (prompt-tester-service / RESULTS.md): 10 models × 399 scenarios = 3,990 OpenRouter calls. The numbers below are for the model call only — not the full sign-end → audible-byte path. Add ~300–500 ms for ElevenLabs TTS + audio mixing.

Latency vs overall score scatter plot — each dot is one model across 399 cases. Up and to the left is best. gemini-3.1-flash-lite-preview leads on composite quality; gpt-5.4-nano and gpt-5.4 are the fastest. llama-4-maverick dominates on raw speed. claude-haiku and command-a are excluded from real-time use due to high timeout rates.

Each dot is one model across 399 cases. Up and to the left is best. Two models (claude-haiku-latest, command-a) have high timeout rates and are excluded from real-time use. Full methodology in the RESULTS.md.

Watch the pipeline run end-to-end.

Open a call and toggle the Debug pane to see classifier ticks, admitted tokens, OpenRouter latency, and TTS bytes in real time.

Open a callOr browse the packages →
Signchat

Sign-language ↔ voice video chat that runs in your browser. No install, no backend, ~1-second end-to-end.

BeaverHacks 2026 · v0.1.0

Product

  • Start a call
  • Architecture

Open source

  • GitHub repo
SIGNCHAT