Signchat
    • Signchat Web
    • Signchat Bridge
    • ASL Classifier
    • Sign Pipeline
    • Architecture
  • Enterprise
  • Education
  • Pricing
Start a call
Architecture

How Signchat actually works.

Sign-end to first audible byte: p50 ~0.6 s, p95 ~0.9 s. No fallbacks. No per-turn relay operated by Signchat. The authoritative spec lives in ARCHITECTURE.md on GitHub — this page is a guided index into it.

Read the full docTry it in a browser
17 sections3 mermaid diagramsUpdated with the code
At a glance

The numbers that matter.

p50 ~0.6 ssign-end → first audible byte
0Signchat-operated relays
17documented sections
MITopen source
The data flow

Two browsers, three providers, one Vercel mint route.

Everything below is summarized from §1 Overview and §3 System diagram. If anything here disagrees with the doc, the doc wins.

Signing direction.The Deaf user’s browser captures landmarks with MediaPipe Tasks Vision, classifies them with an onnxruntime-web model, and admits stable tokens into a buffer. On sentence boundaries the browser POSTs tokens plus hearing context directly to OpenRouter, gets a JSON { sentence, … }back, and streams it to ElevenLabs Flash v2.5 over a WebSocket. Returned 24 kHz PCM is mixed with the user’s mic in Web Audio and published as a single LiveKit signchat-voice track.

Captions direction.The same Deaf browser subscribes to the hearing user’s audio, streams it to ElevenLabs voice-to-text, and forwards partials and finals on the LiveKit data channel so both tiles render the same captions in real time. Word-by-word partials in under a second; finals lock into a global transcript strip.

What Vercel does (and does not) do. Vercel hosts the marketing pages, the room UI, and four short-lived credential-mint endpoints. There is no server-side LiveKit bot, no TTS relay, no WSS gateway. Per §15.3, Vercel is not on the per-turn data path.

Section index

Jump straight into the doc.

Each link opens the matching heading in the GitHub rendering of ARCHITECTURE.md (mermaid diagrams render inline there).

  1. §1OverviewWhat Signchat is, the latency target, and what each provider does on the per-turn path.
  2. §2Goals and non-goalsWhat's in scope for the MVP and what's intentionally deferred (Bridge, group calls, fallbacks).
  3. §3System diagramA mermaid flowchart of every actor on the per-turn path. Renders inline on GitHub.
  4. §4Repository layoutTour of apps/, packages/, and the supporting Python and prompt-tester services.
  5. §5Service inventoryTwelve subsections covering web, transport, classifier, captions, mode controller, audio mixing.
  6. §6Live captions and transcriptWord-by-word partials for the hearing tile and full sentences for the deaf tile, with reliability guarantees.
  7. §7Reliability and failure modesWhat fails loudly, what retries, and why there are no fallback voices.
  8. §8Audio pipeline — signchat-voiceWeb Audio graph, LiveKit publish flags, tab-visibility behaviour for the synthesised mic.
  9. §9Mode controller and captureThe capture state machine, configurable knobs, buffer-admit logic, inline preview UX.
  10. §10API contracts — Vercel routesFour credential-mint endpoints (LiveKit token, OpenRouter session key, ElevenLabs URL, health).
  11. §11Browser-direct provider contractsOpenRouter chat completions, ElevenLabs streaming WSS, sign-pipeline and DataChannel contracts.
  12. §12Environment variablesWhy there are no NEXT_PUBLIC_* variables — the browser never sees a root API key.
  13. §13Performance budgetsPer-stage budgets that add up to the p50 ~0.6 s / p95 ~0.9 s end-to-end target.
  14. §14Security modelWhat each party can see, threat models for secret leakage and provider abuse, mitigations.
  15. §15Deployment topologyWhat Vercel hosts (and explicitly does not host), per-route runtime, deploy workflow.
  16. §16Bridge forward-compatibilityHow the Electron desktop app reuses the same browser pipeline as a system-level mic.
  17. §17Acceptance criteriaThe hard checks the implementation has to pass — latency, no-relay invariants, security.
Key claims

Four things the architecture is willing to commit to.

These are the load-bearing claims behind the product. Each one points at the section of the doc that defines and defends it.

Sign-end to first audible byte: p50 ~0.6 s

End-to-end latency budget broken down per stage in §13. Adds up to ~600 ms p50, ~900 ms p95.

ARCHITECTURE.md §13 ↗

Browser-direct, by architecture

LiveKit Cloud, OpenRouter, and ElevenLabs handle every per-turn data path. Vercel is not on it. There is no Signchat-operated relay.

ARCHITECTURE.md §1, §15.3 ↗

No fallbacks, by choice

A turn either succeeds through the primary path or surfaces a clear error. Errors stay loud — no fallback voice fakes a failed turn.

ARCHITECTURE.md (header) ↗

No NEXT_PUBLIC_* secrets

The browser never sees a root API key. Per-room credentials are minted by short-lived Vercel routes; provider keys are session-scoped and credit-capped.

ARCHITECTURE.md §12, §14 ↗

Read the full architecture doc.

The canonical spec is checked in next to the code and evolves with it. If you spot a mismatch between this page and the doc, file an issue — the doc is the source of truth.

Open ARCHITECTURE.md
Signchat

Sign-language ↔ voice video chat that runs in your browser. No install, no backend, ~1-second end-to-end.

BeaverHacks 2026 · v0.1.0

Product

  • Start a call
  • Architecture

Open source

  • GitHub repo
SIGNCHAT