What it is

A small, fast, landmark-only classifier.

The model alternates causal Conv1D blocks and Transformer blocks over MediaPipe Holistic landmarks, kept under the 40 MB TFLite cap that turned out to be the right capacity for landmark-only ISLR.

Model classConv1D + TransformerKaggle ISLR-derived recipe

Parameters~1.7Mfits Kaggle 40 MB cap

Vocabulary250 signsPopSign / Kaggle ISLR

RuntimeONNX · WASMonnxruntime-web 1.20.1

How it runs

Loaded once, ticked every half-second.

The classifier lives at /models/asl-signs/asl-signs.onnx in the public folder. The browser fetches it once, caches it, and runs WASM inference on every meeting.

ONNX in WebAssembly
We load onnxruntime-web@1.20.1 from a CDN and run the WASM execution provider only — no GPU required, no native install. See onnx-session.ts.
Sliding 48-frame window
The classifier ticks every ~500 ms over the most recent 48 MediaPipe frames and emits the top-3 labels with confidences. Defaults live in DEFAULT_CLASSIFIER_CONFIG in mediapipe-onnx-classifier.ts.
Camera frames stay local
Both MediaPipe Holistic and the ONNX classifier execute in your browser tab. Only the recognized sign tokens — short strings with floats — ever leave the device on the per-turn data path.

Inputs and outputs

Landmarks in, ranked sign labels out.

The browser assembles MediaPipe Holistic landmarks into a Float32 tensor, runs the ONNX model, applies softmax, and picks the top-K. Two blocked label indices (giraffe and drop) are zeroed out in production because they over-fire on idle hands.

Input

Float32Tensor[T, 543, 3]
  T = up to 48 frames (sliding window)
  543 = MediaPipe Holistic landmarks
  3   = (x, y, z) per landmark

The classifier ticks every ~500 ms once the ring buffer has at least 8 frames. Older frames are dropped as new ones arrive.

Output

Float32[250]                  // raw logits
  -> softmax
  -> blocked labels zeroed
  -> top-3 -> ClassifierResult
     { label: string,
       score: number /* 0..1 */ }

Downstream, the mode controller in @signchat/runtime-browser admits a label as a stable or band token when it stays consistent across ticks.

Vocabulary

250 signs from the PopSign corpus.

Glosses follow Kaggle's canonical lowercase form. Aliases in src/data/gloss_aliases.expand_aliases map each one to ASL Citizen / WLASL conventions for cross-dataset pretraining.

TV
after
airplane
all
alligator
animal
another
any
apple
arm
aunt
awake
backyard
bad
balloon
bath
because
bed

First 18 of 250 signs.See all 250 →

Trained on

Google Kaggle asl-signs (PopSign 250).

Trained on the Kaggle Isolated Sign Language Recognition competition dataset — PopSign 250 — for 200 epochs on a single H200 with bf16 + XLA, AdamW, OneCycleLR cosine, and Adversarial Weight Perturbation from epoch 15. Splits are signer-disjoint (13 train / 4 val / 4 held-out) so eval numbers reflect performance on signers the model has never seen.

Open and reproducible

Every recipe, script, and benchmark is in the repo.

Nothing about the classifier is hidden. Read the README, audit the configs, run make eval — or train your own checkpoint and swap it into apps/web.

Full training source under asl-classifier-model/ — model, preprocessing, augmentations, eval.
Reproducible recipes in configs/base.yaml + configs/pretrain_phase1_kaggle.yaml (200 epochs, AdamW, OneCycleLR cosine, AWP from epoch 15).
Signer-disjoint splits in data/splits/kaggle_islr.json (13 train / 4 val / 4 held-out) so eval doesn't leak signers.
Makefile targets for smoke (5 epochs), full train, and eval. RunPod scripts under scripts/ provision an H200 end-to-end.
Per-run benchmark log in experiments.csv — top-1 / top-5 accuracy, parameter count, inference latency.
CPU-only unit tests under tests/ exercised by make test.

See the classifier in a real call.

Open the app, allow your camera, and start signing. The classifier loads in the background while you're in the lobby — by the time you join, it's warm.

Try the classifier Or read the training README →

ASL Classifier

A custom ASL classifier, in your browser.

A ~1.7-million-parameter Conv1D-Transformer hybrid trained on Google Kaggle's asl-signs (PopSign 250) competition, served as ONNX and run on WebAssembly. No model server, no vendor inference API.

Try it now View the model

Trained on Kaggle ISLR200 epochs on a single H200WASM execution provider

What it is

A small, fast, landmark-only classifier.

The model alternates causal Conv1D blocks and Transformer blocks over MediaPipe Holistic landmarks, kept under the 40 MB TFLite cap that turned out to be the right capacity for landmark-only ISLR.

Model classConv1D + TransformerKaggle ISLR-derived recipe

Parameters~1.7Mfits Kaggle 40 MB cap

Vocabulary250 signsPopSign / Kaggle ISLR

RuntimeONNX · WASMonnxruntime-web 1.20.1

How it runs

Loaded once, ticked every half-second.

The classifier lives at /models/asl-signs/asl-signs.onnx in the public folder. The browser fetches it once, caches it, and runs WASM inference on every meeting.

ONNX in WebAssembly
We load onnxruntime-web@1.20.1 from a CDN and run the WASM execution provider only — no GPU required, no native install. See onnx-session.ts.
Sliding 48-frame window
The classifier ticks every ~500 ms over the most recent 48 MediaPipe frames and emits the top-3 labels with confidences. Defaults live in DEFAULT_CLASSIFIER_CONFIG in mediapipe-onnx-classifier.ts.
Camera frames stay local
Both MediaPipe Holistic and the ONNX classifier execute in your browser tab. Only the recognized sign tokens — short strings with floats — ever leave the device on the per-turn data path.

Inputs and outputs

Landmarks in, ranked sign labels out.

Input

Float32Tensor[T, 543, 3]
  T = up to 48 frames (sliding window)
  543 = MediaPipe Holistic landmarks
  3   = (x, y, z) per landmark

The classifier ticks every ~500 ms once the ring buffer has at least 8 frames. Older frames are dropped as new ones arrive.

Output

Float32[250]                  // raw logits
  -> softmax
  -> blocked labels zeroed
  -> top-3 -> ClassifierResult
     { label: string,
       score: number /* 0..1 */ }

Downstream, the mode controller in @signchat/runtime-browser admits a label as a stable or band token when it stays consistent across ticks.

Vocabulary

250 signs from the PopSign corpus.

Glosses follow Kaggle's canonical lowercase form. Aliases in src/data/gloss_aliases.expand_aliases map each one to ASL Citizen / WLASL conventions for cross-dataset pretraining.

TV
after
airplane
all
alligator
animal
another
any
apple
arm
aunt
awake
backyard
bad
balloon
bath
because
bed

First 18 of 250 signs.See all 250 →

Trained on

Google Kaggle asl-signs (PopSign 250).

Open and reproducible

Every recipe, script, and benchmark is in the repo.

Nothing about the classifier is hidden. Read the README, audit the configs, run make eval — or train your own checkpoint and swap it into apps/web.

Full training source under asl-classifier-model/ — model, preprocessing, augmentations, eval.
Reproducible recipes in configs/base.yaml + configs/pretrain_phase1_kaggle.yaml (200 epochs, AdamW, OneCycleLR cosine, AWP from epoch 15).
Signer-disjoint splits in data/splits/kaggle_islr.json (13 train / 4 val / 4 held-out) so eval doesn't leak signers.
Makefile targets for smoke (5 epochs), full train, and eval. RunPod scripts under scripts/ provision an H200 end-to-end.
Per-run benchmark log in experiments.csv — top-1 / top-5 accuracy, parameter count, inference latency.
CPU-only unit tests under tests/ exercised by make test.

See the classifier in a real call.

Open the app, allow your camera, and start signing. The classifier loads in the background while you're in the lobby — by the time you join, it's warm.

Try the classifier Or read the training README →

A custom ASL classifier, in your browser.

ONNX in WebAssembly

Sliding 48-frame window

Camera frames stay local

See the classifier in a real call.

A custom ASL classifier, in your browser.

ONNX in WebAssembly

Sliding 48-frame window

Camera frames stay local

See the classifier in a real call.