/ why this exists

One certified interpreter for every 200,000 deaf Indians.

Nobody agrees on the exact count. The 2011 Census recorded about 5 million deaf Indians; broader estimates of significant hearing loss run to 63 million. The other side of the ratio is not in dispute. The official directory lists fewer than 300 certified ISL interpreters for the whole country. A classroom, a hospital ward, a courtroom, a job interview: most of the time, no interpreter is coming.

BhashaSetu is our attempt to shrink that gap with software. A model anyone can run on a phone or a laptop, translating both directions in real time. It won't replace human interpreters, and it shouldn't. It's for the rooms they can never reach.

63M

Indians with significant hearing loss, by the broadest estimate

WHO estimate · Census 2011 counts 5M deaf

<300

certified ISL interpreters in the entire country

ISLRTC interpreter directory

26%

of working-age deaf adults are in formal employment

Census of India · 2011

10k

terms in the official ISL dictionary. It had 3,000 in 2018. The language is still being written down.

ISLRTC · 3rd edition, Feb 2021 · our training target

/ where the gap hurts most

Three places it shows up every day.

Numbers that size stop meaning anything after a while. So here is where the interpreter shortage actually lands: at school, at the doctor, and at work.

Education

19%

of Deaf children aged 6 to 13 are out of school entirely. Many schools that do enrol them teach through lip-reading and speech drills instead of ISL. And a teacher who wants to sign has no formal curriculum to learn it from.

DHH out-of-school study · 360info, 2014

Healthcare

69%

of disabled Indians live in rural areas, which is exactly where those 300-odd interpreters are not. Explaining chest pain to a doctor through gestures and a relative's guesswork is the current reality. Translation in a clinic is not a convenience feature. It is informed consent.

Census of India · 2011

Employment

74%

of working-age Deaf adults are in marginal or informal work. Miss school, struggle with written Hindi or English as a result, then walk into interviews where nobody signs. Each step makes the next one worse, and the cost gets passed down through families.

Census of India · 2011

/ what makes ISL distinct

ISL isn't Hindi or English with hands.

It's a full language with its own grammar. Most sign-translation demos fail in India because they treat signing as word-by-word substitution, which produces nonsense in both directions. Four things a model has to get right:

two-handed

Two-handed alphabet

ISL fingerspells with both hands. ASL and most Western sign languages use one. Models pretrained on Western data transfer poorly because the hand shapes themselves are different.

SOV grammar

Subject-Object-Verb

"I rice eat," not "I eat rice." A word-for-word translation produces gibberish. ISL → text requires a real grammatical reordering step, not a lookup table.

non-manual

The face is grammar

Raised eyebrows mark yes/no questions. A head-shake makes a clause negative. Mouth-shape carries intensity. Ignore non-manuals and you lose half the meaning.

regional

One language, many dialects

The grammar is shared across India but vocabulary shifts by region. A signer from Mumbai and one from Kolkata agree on sentence structure, not always on the sign for a word. Regional fine-tunes are part of the plan, not an afterthought.

{ /* open */ }

/ why open-source, specifically

This problem can't be solved by a closed product.

The commercial sign-language tools that exist today are built for ASL, closed, and priced for Western institutions, not for a government school in Bihar or a district hospital in Telangana. The model is the moat. That business does not fit a country with one shared sign language, dozens of regional variants, and a Deaf community that has spent decades being talked at instead of asked.

And the data could hardly be more personal. It is video of people's faces and hands, often recorded at home. We think that footage should never have to leave the device, and the only way to make that promise believable is to open everything: the weights, the code, the consented recordings, and the eval suite, under permissive licenses anyone can audit, fork, and retrain.

A note on framing.The disability-rights principle "nothing about us, without us" applies to software too. Deaf signers, ISL educators, and accessibility researchers review what we ship, and that review started before the first line of code.

Two-way

sign ↔ text & speech

10k

ISL terms targeted

spoken languages at launch · Hindi & English

100%

on-device · open source

/ what we're building

Real-time, two-way, on every device.

One model, both directions: continuous signing into text and speech, and text back into signing through a 3D avatar, all running on the device itself. To be clear, these are design goals we're building toward, not shipped features.

Continuous signing, not single-glyph.

Most published ISL models classify one isolated sign at a time. That wins benchmarks and loses conversations. We're working on continuous-sentence recognition that keeps the grammar, the fingerspelling, and the non-manual markers (eyebrows, mouth shape, head tilt), since those carry half the meaning in ISL.

arch · pose extraction → ST-GCN → transformer decoder

Text & speech → sign avatar.

Type or speak in Hindi or English and a rigged 3D avatar signs it back in ISL, with smooth transitions between signs. That closes the loop: a hearing person and a Deaf signer holding an actual conversation instead of passing notes.

target: hi · en

On-device. No cloud.

The model targets browser inference through WebGPU and Android through NNAPI. Your camera feed stays on your device. Nothing uploads, nobody makes you sign in, and it keeps working where the network doesn't.

webgpu · onnx · quantized weights

Honest about the data.

Public ISL datasets are tiny. INCLUDE covers 263 signs, CISLR lists 4,765 words with a handful of clips each. That is nowhere near a language. We start there anyway, and grow the corpus with consented recordings from real signers.

baselines: INCLUDE · CISLR · ISLRTC dict

Accessibility first.

Captions everywhere, haptic cues, high-contrast theming, large hit targets, full keyboard navigation. We treat WCAG 2.2 AA as the minimum to ship, and our Deaf collaborators catch what the checklist misses.

wcag 2.2 AA · keyboard-first · reduced-motion safe

Open code, open weights, open data.

Permissive licenses across the board. Fine-tune for your school, your clinic, your state. Commercial use is fine. No CLAs.

MIT (code) · Apache-2.0 (weights) · CC-BY-SA (data)

/ how it works

A five-stage pipeline, end to end.

Camera frames go in. Translated speech and a signing avatar come out. Every stage is swappable and benchmarked.

Pose extraction

Hand, body and face landmarks per frame using MediaPipe Holistic: 21 hand + 33 body + 468 face points, batched on GPU.

mediapipe · 30 fps

Spatio-temporal encoder

An ST-GCN learns sign morphology over a sliding window, fusing manual + non-manual features (eyebrows, mouth).

st-gcn

Sentence decoder

A transformer-CTC head emits gloss sequences; a small LM reorders ISL's SOV grammar into natural Hindi / English / regional.

transformer-ctc

Text → gloss

Back-translation pairs gloss with multilingual text using a seq2seq fine-tuned on parallel ISL↔text data we're curating.

mT5 · small

Avatar retargeting

Gloss sequences drive a rigged 3D signer with smoothed inverse kinematics and non-manual blending.

three.js · webgpu

/ integrations

Built to drop in anywhere.

None of this is installable yet, and we'd rather say so plainly than fake an npm badge. The plan is a thin JS SDK for the browser, a Python package for notebooks and servers, and open weights on HuggingFace. All three land with v0.1.

JavaScript SDK

@bhashasetu/web

coming soon

Drop-in browser SDK with WebGPU runtime, a web component for the signing avatar, and an event-driven translation stream.

Python package

pip install bhashasetu

coming soon

For notebooks, servers, and batch processing. Same API as the JS SDK, ONNX runtime under the hood, CPU and CUDA backends.

🤗

HuggingFace model

bhashasetu/setu-isl

coming soon

Weights, model card, eval results, and a Spaces demo. Apache-2.0 licensed, fully fine-tunable, no gating.

The shape of the API.

We froze the API surface early so contributors have something stable to build against. This is what calling it will look like.

Install the package

Available via npm for web, and pip for Python notebooks & servers.

Grant camera access

The SDK requests webcam permission once. Frames are processed entirely on-device.

Subscribe to translations

Listen for onTranslation events to get gloss, text, and confidence per utterance.

Render the avatar (optional)

Drop in the <setu-avatar> web component for two-way conversations.

app.js · proposed API

// install: npm i @bhashasetu/web   (coming soon)
import { Setu } from "@bhashasetu/web";

const setu = await Setu.load({
  model:   "setu-isl-base",
  target:  "hi",        // output language
  backend: "webgpu",
});

await setu.start(document.getElementById("cam"));

setu.onTranslation(({ text, gloss, conf }) => {
  console.log(gloss, "→", text, `(${conf.toFixed(2)})`);
});

// the other direction: text → signing avatar
const avatar = setu.avatar("#stage");
await avatar.say("नमस्ते, आप कैसे हैं?");

/ waitlist

Be there when it works.

There's nothing to download yet, and the waitlist is how this project finds its people. Tell us who you are and we'll email you when something actually happens. A handful of emails a year, each one earned.

First call when v0.1 ships and we need testers
An invite when consented sign recording opens
A heads-up when the repo and Discord go public

/ roadmap

Where things stand.

Data first, model second, polish last. That order is deliberate, and it means the flashy parts come at the end.

v0.0

Research & data audit

Survey existing ISL datasets (INCLUDE, CISLR, ISLRTC dict), define gloss vocabulary, draft model card & consent protocols.

in progress

v0.1

Baseline model + JS SDK alpha

Isolated sign recognition on existing public data. Browser SDK with the API above. Honest about what doesn't work yet.

v0.2

Continuous-sentence decoding

Transformer-CTC head, sliding-window inference, real-time streaming translations.

later

v0.3

Text/speech → sign avatar

The other direction. Rigged 3D signer driven by gloss sequences from a multilingual seq2seq.

later

v1.0

Production-ready, regional dialects

Fine-tunes for state-level ISL variations, mobile-first runtime, classroom & clinic deployment kits.

future

Get involved

We're looking for ML engineers, ISL signers, accessibility researchers, and anyone who wants to help. Repo and community channels go public with v0.1.

$ git clone github.com/bhashasetu/setusoon

$ pip install bhashasetusoon

GitHub sooncode, issues, discussions Discord soonchat & weekly sync HuggingFace soonweights & spaces Email us →bhashasetuisl@gmail.com

A bridge between
sign and speech.

Input · Live feed

Output · Translation

One certified interpreter for every 200,000 deaf Indians.

Three places it shows up every day.

ISL isn't Hindi or English with hands.

Two-handed alphabet

Subject-Object-Verb

The face is grammar

One language, many dialects

This problem can't be solved by a closed product.

Real-time, two-way, on every device.

Continuous signing, not single-glyph.

Text & speech → sign avatar.

On-device. No cloud.

Honest about the data.

Accessibility first.

Open code, open weights, open data.

A five-stage pipeline, end to end.

Pose extraction

Spatio-temporal encoder

Sentence decoder

Text → gloss

Avatar retargeting

Built to drop in anywhere.

The shape of the API.

Install the package

Grant camera access

Subscribe to translations

Render the avatar (optional)

Be there when it works.

Where things stand.

Research & data audit

Baseline model + JS SDK alpha

Continuous-sentence decoding

Text/speech → sign avatar

Production-ready, regional dialects

Get involved

A bridge betweensign and speech.

Input · Live feed

Output · Translation

One certified interpreter for every 200,000 deaf Indians.

Three places it shows up every day.

ISL isn't Hindi or English with hands.

Two-handed alphabet

Subject-Object-Verb

The face is grammar

One language, many dialects

This problem can't be solved by a closed product.

Real-time, two-way, on every device.

Continuous signing, not single-glyph.

Text & speech → sign avatar.

On-device. No cloud.

Honest about the data.

Accessibility first.

Open code, open weights, open data.

A five-stage pipeline, end to end.

Pose extraction

Spatio-temporal encoder

Sentence decoder

Text → gloss

Avatar retargeting

Built to drop in anywhere.

The shape of the API.

Install the package

Grant camera access

Subscribe to translations

Render the avatar (optional)

Deaf reviewers sign off before anything ships.

Be there when it works.

Where things stand.

Research & data audit

Baseline model + JS SDK alpha

Continuous-sentence decoding

Text/speech → sign avatar

Production-ready, regional dialects

Get involved

A bridge between
sign and speech.