An open project by Samic Ventures

AI that speaks African languages

Lisan is an open, Amharic-first speech and language stack covering speech-to-text, text-to-speech and language understanding, for five East African languages spoken by more than 150 million people, yet largely missing from production AI.

Already proven in production through Dewul, our live multilingual voice AI.

150M+
First-language speakers
5
Languages, one open stack
3
Language families covered
Live
Validated on real calls
The gap

Most of the world's AI doesn't understand a word they say.

Voice assistants, transcription, translation and chatbots have transformed how the world works, but they were never built for Amharic, Tigrinya, Afaan Oromo, Somali or Swahili. Tens of millions of people, businesses and institutions are locked out of modern AI simply because of the language they speak. Lisan exists to close that gap with open, reusable building blocks.

What we build

An open foundation for in-language voice and text AI.

Five reusable components, released openly so any builder, researcher or institution can deploy AI that works in these languages.

🎙️

Speech-to-text (ASR)

Open speech recognition fine-tuned for each language and hardened for noisy, real-world telephony audio.

🔊

Text-to-speech (TTS)

Natural, expressive voices in each language, so machines can speak back the way people actually talk.

🧠

Language understanding

Intent and meaning extraction for task-oriented dialogue: booking, answering, routing and more.

📊

Open benchmark

A reproducible, public leaderboard so the whole community can measure and push progress on these languages.

🧰

Inference toolkit

A simple way for any developer to deploy an in-language voice agent on their own infrastructure.

🌍

Open by default

Datasets, models, benchmarks and tooling published with model cards and datasheets. Public goods, not black boxes.

Languages

Five languages. Three families. Two scripts.

We start with Amharic and extend across families and scripts, so the methods we prove here transfer to many more African languages.

Amharic

Semitic · Ge'ez
~57M

Tigrinya

Semitic · Ge'ez
~9M
O

Afaan Oromo

Cushitic · Latin
~37M
S

Somali

Cushitic · Latin
~21M
Sw

Swahili

Bantu · Latin
80M+
Approach

Leverage what exists. Validate in the real world. Release it openly.

Build on open foundations

We start from existing open corpora and strong open base models rather than reinventing them, and collect targeted new data only where real gaps exist.

Adapt for the language

Transfer learning with Ge'ez-script normalization and morphology handling for the Semitic languages, tuned for the way each language is actually written and spoken.

Prove it in production

Every model is deployed in Dewul and measured on real customer calls, not just held-out test sets, so quality reflects the real world.

Give it back

Models, datasets, the benchmark and the toolkit are published openly, so the African NLP community can build on top of them.

Not a research demo. It already runs.

Dewul, our multilingual AI receptionist, answers live business calls in Amharic today. Lisan hardens and opens the speech and language stack underneath it — turning a working commercial product into public goods for everyone.

See Dewul live →
Partnership

Let's build African-language AI, together.

We're inviting researchers, native-speaker communities, institutions and funders to co-build Lisan as equals, with co-authorship on the open datasets, models and benchmark and a shared stake in the outcome.