UNDER DEVELOPMENT · OPEN-SOURCE · MIT LICENCE FOR COMMERCIAL USE · APACHE 2.0

A bridge between
sign and speech.

BhashaSetu is an open-source AI for two-way Indian Sign Language translation — built with and for the Deaf and hard-of-hearing community of India. We're early, working in public, and looking for collaborators.

Two-way · sign ↔ text & speech
On-device · privacy by default
Open · code, weights & data
preview · concept

Input · Live feed

tracking
21 landmarks
30 fps
detectedHOSPITAL
confidence
0.94

Output · Translation

hi · हिन्दी
glossWHERENEARESTHOSPITAL?

Where is the nearest hospital?

नज़दीकी अस्पताल कहाँ है?

0:02
target · browser + Androidruntime · WebGPU / NNAPI
live demo: in development
/ why this exists

One certified interpreter for every 200,000 deaf Indians.

India is home to roughly 63 million people in the Deaf and hard-of-hearing community — yet has fewer than 300 certified ISL interpreters. Education, healthcare, government services, employment: most stay out of reach for want of someone who can sign.

BhashaSetu is a small attempt at a big gap: an open-source model that anyone can run on a phone or laptop, two-way, in real time. Not a replacement for human interpreters — a complement, where there are none.

63M
people in India's Deaf & hard-of-hearing community
en.wikipedia.org / DHH India
<300
certified ISL interpreters in the entire country
ISLRTC · govt. of India
26%
of working-age deaf adults are in formal employment
Census of India · 2011
10k
terms in the official ISL Dictionary (3rd ed., 2021)
ISLRTC · our training target
/ where the gap hurts most

Three places it shows up every day.

It's easy to read "63 million people, <300 interpreters" as an abstraction. Here's what that means for an actual Deaf Indian, week to week.

Education
19%

of Deaf children aged 6–13 are out of school entirely. Most schools that do enrol them use oralist methods — lip-reading and forced speech — not ISL. Even where teachers want to sign, no formal ISL curriculum exists for them.

DHH out-of-school study · 360info, 2014
Healthcare
69%

of disabled Indians live in rural areas — where the <300 certified ISL interpreters basically don't exist. Imagine explaining symptoms to a doctor without a shared language. Real-time translation isn't a luxury; it's informed consent.

Census of India · 2011
Employment
74%

of working-age Deaf adults are in marginal or informal work. When school is inaccessible, written communication is shaky, and an interview interpreter is unavailable, the disadvantage compounds across families and generations.

Census of India · 2011
/ what makes ISL distinct

ISL isn't Hindi or English with hands.

It's a complete, grammatically distinct language — and the reason most "sign translators" fail in India is they treat it as one-to-one word substitution. Four things any serious ISL model has to get right:

two-handed
Two-handed alphabet

ISL fingerspells with both hands. ASL and most Western sign languages use one. Models trained on Western data don't transfer cleanly — the morphology is genuinely different.

SOV grammar
Subject–Object–Verb

"I rice eat," not "I eat rice." A word-for-word translation produces gibberish. ISL → text requires a real grammatical reordering step, not a lookup table.

non-manual
The face is grammar

Raised eyebrows mark yes/no questions. A head-shake makes a clause negative. Mouth-shape carries intensity. Ignore non-manuals and you lose half the meaning.

regional
One language, many dialects

ISL is shared pan-India, but vocabulary varies state-to-state — Mumbai-ISL and Kolkata-ISL share grammar, differ in signs. Fine-tunes for regional dialects are not optional.

{ /* open */ }
/ why open-source, specifically

This problem can't be solved by a closed product.

Commercial sign-language tools today are ASL-focused, closed, and priced out of reachfor Indian schools, clinics, and panchayat offices. They treat the model as a moat. That doesn't fit the shape of this problem — a country with one shared sign language, dozens of regional dialects, and a Deaf community that has been talked at for decades.

Camera data of a Deaf signer is among the most sensitive there is. It can't sit on a US-based server, behind a paywall, with terms of service no one read. The weights, the code, the data recordings, and the eval suite all have to be open — under permissive licenses, auditable, fork-able, and shaped by the people the system claims to serve.

A note on framing."Nothing about us, without us" — the disability-rights principle — is the operating principle here. Deaf signers, ISL educators, and accessibility researchers shape every release, starting before the first line of code.

Two-way
sign ↔ text & speech
10k
ISL terms targeted
2 Languages
HINDI AND ENGLISH
100%
on-device · open source
/ what we're building

Real-time, two-way, on every device.

A single model that handles continuous signing in one direction and drives a 3D avatar in the other — running fully on-device so the camera feed never leaves the phone. Here's the design we're working toward.

Continuous signing, not single-glyph.

Most published ISL models classify one isolated sign at a time. We're working toward continuous-sentence recognition that preserves grammar, finger-spelling, and non-manual markers (eyebrows, mouth-shape, head-tilt) — because that's how ISL actually works.

arch · pose extraction → ST-GCN → transformer decoder
namasteaapkaisehain?

Text & speech → sign avatar.

Type or speak in Hindi and English. A retargeted 3D avatar performs the corresponding ISL with natural transitions — so a hearing speaker and a Deaf signer can hold a real conversation.

target: hi · en

On-device. No cloud.

The model targets browser inference via WebGPU and Android via NNAPI. Your camera feed stays on your device — nothing is uploaded, no account needed.

webgpu · onnx · quantized weights

Honest about the data.

Public ISL datasets are small — INCLUDE has 263 signs, CISLR has 4,765 with very few samples each. We start from these and grow a community-collected corpus, with consent, from real signers.

baselines: INCLUDE · CISLR · ISLRTC dict

Accessibility first.

Designed alongside Deaf and hard-of-hearing collaborators — captions everywhere, haptic cues, high-contrast theming, large hit targets, full keyboard navigation. WCAG 2.2 AA is the floor, not the ceiling.

wcag 2.2 AA · keyboard-first · reduced-motion safe

Open code, open weights, open data.

Permissive licenses across the board. Fine-tune for your school, your clinic, your state. Commercial use is fine. No CLAs.

MIT (code) · Apache-2.0 (weights) · CC-BY-SA (data)
/ how it works

A five-stage pipeline, end to end.

Camera frames go in. Translated speech and a signing avatar come out. Every stage is swappable and benchmarked.

01

Pose extraction

Hand, body and face landmarks per frame using MediaPipe Holistic — 21 hand + 33 body + 468 face points, batched on GPU.

mediapipe · 30 fps
02

Spatio-temporal encoder

An ST-GCN learns sign morphology over a sliding window, fusing manual + non-manual features (eyebrows, mouth).

st-gcn
03

Sentence decoder

A transformer-CTC head emits gloss sequences; a small LM reorders ISL's SOV grammar into natural Hindi / English / regional.

transformer-ctc
04

Text → gloss

Back-translation pairs gloss with multilingual text using a seq2seq fine-tuned on parallel ISL↔text data we're curating.

mT5 · small
05

Avatar retargeting

Gloss sequences drive a rigged 3D signer with smoothed inverse kinematics and non-manual blending.

three.js · webgpu
/ integrations

Built to drop in anywhere.

When BhashaSetu ships, here's how you'll consume it — a thin JS SDK for the browser, a Python package for notebooks and servers, and an open model on HuggingFace. Install paths go live with v0.1.

JS
JavaScript SDK
@bhashasetu/web
coming soon

Drop-in browser SDK with WebGPU runtime, a web component for the signing avatar, and an event-driven translation stream.

Py
Python package
pip install bhashasetu
coming soon

For notebooks, servers, and batch processing. Same API as the JS SDK, ONNX runtime under the hood, CPU and CUDA backends.

🤗
HuggingFace model
bhashasetu/setu-isl
coming soon

Weights, model card, eval results, and a Spaces demo. Apache-2.0 licensed, fully fine-tunable, no gating.

The shape of the API.

A preview of what the SDK will look like — fixed early so contributors can build against it.

1
Install the package

Available via npm for web, and pip for Python notebooks & servers.

2
Grant camera access

The SDK requests webcam permission once. Frames are processed entirely on-device.

3
Subscribe to translations

Listen for onTranslation events — get gloss, text, and confidence per utterance.

4
Render the avatar (optional)

Drop in the <setu-avatar> web component for two-way conversations.

app.js · proposed API
// install: npm i @bhashasetu/web   (coming soon)
import { Setu } from "@bhashasetu/web";

const setu = await Setu.load({
  model:   "setu-isl-base",
  target:  "hi",        // output language
  backend: "webgpu",
});

await setu.start(document.getElementById("cam"));

setu.onTranslation(({ text, gloss, conf }) => {
  console.log(gloss, "→", text, `(${conf.toFixed(2)})`);
});

// the other direction: text → signing avatar
const avatar = setu.avatar("#stage");
await avatar.say("नमस्ते, आप कैसे हैं?");

Built with the Deaf community, not just for it.

Translation models can fail in ways that aren't obvious to hearing developers. We're partnering with Deaf signers, ISL educators, and accessibility orgs to review every release — starting from datasets and continuing through UI, error states, and the model card itself.

Live captions
Haptic cues
High-contrast
Keyboard-first
/ roadmap

Open, in public.

Where we are, and where we're going. Honest about the order — data first, model second, polish last.

v0.0
Research & data audit

Survey existing ISL datasets (INCLUDE, CISLR, ISLRTC dict), define gloss vocabulary, draft model card & consent protocols.

in progress
v0.1
Baseline model + JS SDK alpha

Isolated sign recognition on existing public data. Browser SDK with the API above. Honest about what doesn't work yet.

next
v0.2
Continuous-sentence decoding

Transformer-CTC head, sliding-window inference, real-time streaming translations.

later
v0.3
Text/speech → sign avatar

The other direction. Rigged 3D signer driven by gloss sequences from a multilingual seq2seq.

later
v1.0
Production-ready, regional dialects

Fine-tunes for state-level ISL variations, mobile-first runtime, classroom & clinic deployment kits.

future

Get involved

We're looking for ML engineers, ISL signers, accessibility researchers, and anyone who wants to help. Repo and community channels go public with v0.1.

$ git clone github.com/bhashasetu/setusoon
$ pip install bhashasetusoon