Signchat
    • Signchat Web
    • Signchat Bridge
    • ASL Classifier
    • Sign Pipeline
    • Architecture
  • Enterprise
  • Education
  • Pricing
Start a call
ASL Classifier

A custom ASL classifier, in your browser.

A ~1.7-million-parameter Conv1D-Transformer hybrid trained on Google Kaggle's asl-signs (PopSign 250) competition, served as ONNX and run on WebAssembly. No model server, no vendor inference API.

Try it nowView the model
Trained on Kaggle ISLR200 epochs on a single H200WASM execution provider
What it is

A small, fast, landmark-only classifier.

The model alternates causal Conv1D blocks and Transformer blocks over MediaPipe Holistic landmarks, kept under the 40 MB TFLite cap that turned out to be the right capacity for landmark-only ISLR.

Model classConv1D + TransformerKaggle ISLR-derived recipe
Parameters~1.7Mfits Kaggle 40 MB cap
Vocabulary250 signsPopSign / Kaggle ISLR
RuntimeONNX · WASMonnxruntime-web 1.20.1
How it runs

Loaded once, ticked every half-second.

The classifier lives at /models/asl-signs/asl-signs.onnx in the public folder. The browser fetches it once, caches it, and runs WASM inference on every meeting.

  • ONNX in WebAssembly

    We load onnxruntime-web@1.20.1 from a CDN and run the WASM execution provider only — no GPU required, no native install. See onnx-session.ts.

  • Sliding 48-frame window

    The classifier ticks every ~500 ms over the most recent 48 MediaPipe frames and emits the top-3 labels with confidences. Defaults live in DEFAULT_CLASSIFIER_CONFIG in mediapipe-onnx-classifier.ts.

  • Camera frames stay local

    Both MediaPipe Holistic and the ONNX classifier execute in your browser tab. Only the recognized sign tokens — short strings with floats — ever leave the device on the per-turn data path.

Inputs and outputs

Landmarks in, ranked sign labels out.

The browser assembles MediaPipe Holistic landmarks into a Float32 tensor, runs the ONNX model, applies softmax, and picks the top-K. Two blocked label indices (giraffe and drop) are zeroed out in production because they over-fire on idle hands.

Input
Float32Tensor[T, 543, 3]
  T = up to 48 frames (sliding window)
  543 = MediaPipe Holistic landmarks
  3   = (x, y, z) per landmark

The classifier ticks every ~500 ms once the ring buffer has at least 8 frames. Older frames are dropped as new ones arrive.

Output
Float32[250]                  // raw logits
  -> softmax
  -> blocked labels zeroed
  -> top-3 -> ClassifierResult
     { label: string,
       score: number /* 0..1 */ }

Downstream, the mode controller in @signchat/runtime-browser admits a label as a stable or band token when it stays consistent across ticks.

Vocabulary

250 signs from the PopSign corpus.

Glosses follow Kaggle's canonical lowercase form. Aliases in src/data/gloss_aliases.expand_aliases map each one to ASL Citizen / WLASL conventions for cross-dataset pretraining.

  • TV
  • after
  • airplane
  • all
  • alligator
  • animal
  • another
  • any
  • apple
  • arm
  • aunt
  • awake
  • backyard
  • bad
  • balloon
  • bath
  • because
  • bed
First 18 of 250 signs.See all 250 →
Trained on

Google Kaggle asl-signs (PopSign 250).

Trained on the Kaggle Isolated Sign Language Recognition competition dataset — PopSign 250 — for 200 epochs on a single H200 with bf16 + XLA, AdamW, OneCycleLR cosine, and Adversarial Weight Perturbation from epoch 15. Splits are signer-disjoint (13 train / 4 val / 4 held-out) so eval numbers reflect performance on signers the model has never seen.

Open and reproducible

Every recipe, script, and benchmark is in the repo.

Nothing about the classifier is hidden. Read the README, audit the configs, run make eval — or train your own checkpoint and swap it into apps/web.

  • Full training source under asl-classifier-model/ — model, preprocessing, augmentations, eval.
  • Reproducible recipes in configs/base.yaml + configs/pretrain_phase1_kaggle.yaml (200 epochs, AdamW, OneCycleLR cosine, AWP from epoch 15).
  • Signer-disjoint splits in data/splits/kaggle_islr.json (13 train / 4 val / 4 held-out) so eval doesn't leak signers.
  • Makefile targets for smoke (5 epochs), full train, and eval. RunPod scripts under scripts/ provision an H200 end-to-end.
  • Per-run benchmark log in experiments.csv — top-1 / top-5 accuracy, parameter count, inference latency.
  • CPU-only unit tests under tests/ exercised by make test.

See the classifier in a real call.

Open the app, allow your camera, and start signing. The classifier loads in the background while you're in the lobby — by the time you join, it's warm.

Try the classifierOr read the training README →
Signchat

Sign-language ↔ voice video chat that runs in your browser. No install, no backend, ~1-second end-to-end.

BeaverHacks 2026 · v0.1.0

Product

  • Start a call
  • Architecture

Open source

  • GitHub repo
SIGNCHAT