silero-models

Silero Models: pre-trained text-to-speech models made embarrassingly simple

5,552

349

5,552

View on GitHub

Top Related Projects

vosk-api

14,058

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

TTS

43,367

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

DeepSpeech

26,652

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

espnet

9,572

End-to-End Speech Processing Toolkit

kaldi

15,211

kaldi-asr/kaldi is the official location of the Kaldi project.

fairseq

31,926

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

Silero Models is an open-source project that provides pre-trained speech-to-text, text-to-speech, and voice activity detection models. It aims to make speech recognition and synthesis accessible and easy to use for developers and researchers, offering high-quality models that can be run efficiently on various devices.

Pros

Easy to use and integrate into existing projects
Supports multiple languages and accents
Offers lightweight models suitable for edge devices and mobile applications
Provides regular updates and improvements to model performance

Cons

Limited customization options for specific use cases
May not perform as well as some commercial solutions for certain languages
Requires some technical knowledge to set up and use effectively
Documentation could be more comprehensive for advanced usage scenarios

Code Examples

Speech-to-Text (STT) example:

import torch
import zipfile
import torchaudio
from omegaconf import OmegaConf

# Load the model
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en')

# Transcribe audio
wav_path = 'path/to/audio/file.wav'
waveform, sample_rate = torchaudio.load(wav_path)
transcription = model(waveform)
print(transcription)

Text-to-Speech (TTS) example:

import torch

# Load the model
model, _ = torch.hub.load(repo_or_dir='snakers4/silero-models',
                          model='silero_tts',
                          language='en',
                          speaker='en_0')

# Generate speech
text = "Hello, this is a test of text-to-speech synthesis."
audio = model.apply_tts(text=text,
                        speaker='en_0',
                        sample_rate=48000)

# Save the audio
torchaudio.save('output.wav', audio.unsqueeze(0), sample_rate=48000)

Voice Activity Detection (VAD) example:

import torch
import torchaudio

# Load the model
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                              model='silero_vad',
                              force_reload=True)

# Perform VAD on audio
wav_path = 'path/to/audio/file.wav'
waveform, sample_rate = torchaudio.load(wav_path)
vad_segments = utils.get_speech_timestamps(waveform, model, threshold=0.5)
print(vad_segments)

Getting Started

To get started with Silero Models:

Install the required dependencies:
```
pip install torch torchaudio omegaconf
```
Load the desired model using torch.hub.load() as shown in the code examples above.
Process your audio or text data using the loaded model's functions.
For more detailed usage instructions and advanced features, refer to the project's GitHub repository and documentation.

Competitor Comparisons

vosk-api

14,058

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Pros of vosk-api

Supports a wider range of languages and accents
Offers offline speech recognition capabilities
Provides a more flexible API for integration into various applications

Cons of vosk-api

Generally slower processing speed compared to Silero models
Requires more computational resources for operation
Less focus on lightweight models for edge devices

Code Comparison

vosk-api:

from vosk import Model, KaldiRecognizer
import pyaudio

model = Model("model")
rec = KaldiRecognizer(model, 16000)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

while True:
    data = stream.read(4000)
    if rec.AcceptWaveform(data):
        print(rec.Result())

silero-models:

import torch
from silero import models

model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt')

device = torch.device('cpu')
model = model.to(device)

wav_path = 'audio.wav'
transcription = model.transcribe(wav_path)
print(transcription)

Both repositories offer speech recognition capabilities, but they differ in their approach and use cases. vosk-api provides a more comprehensive solution for various languages and offline usage, while silero-models focuses on lightweight, efficient models suitable for edge devices and quick processing.

TTS

43,367

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Pros of TTS

More comprehensive and feature-rich, offering a wider range of TTS models and voice conversion capabilities
Active development with frequent updates and a larger community
Supports multiple languages and provides pre-trained models for various use cases

Cons of TTS

Higher complexity and steeper learning curve compared to Silero Models
Requires more computational resources and may have longer inference times
Installation process can be more involved, especially for certain features

Code Comparison

TTS:

from TTS.api import TTS

tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC")
tts.tts_to_file(text="Hello world!", file_path="output.wav")

Silero Models:

import torch

model, _ = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts')
audio = model.apply_tts("Hello world!", speaker='en_0', sample_rate=48000)

Both repositories offer powerful text-to-speech capabilities, but TTS provides a more extensive set of features and models at the cost of increased complexity. Silero Models, on the other hand, offers a simpler and more lightweight solution that may be easier to integrate for basic TTS needs.

DeepSpeech

26,652

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Pros of DeepSpeech

More established project with a larger community and extensive documentation
Supports multiple languages and accents out of the box
Offers pre-trained models for immediate use

Cons of DeepSpeech

Requires more computational resources and has a larger model size
Can be more complex to set up and integrate into projects
May have slower inference times compared to Silero Models

Code Comparison

DeepSpeech:

import deepspeech
model = deepspeech.Model('path/to/model.pbmm')
text = model.stt(audio)

Silero Models:

import torch
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt')
text = model(audio)

Both repositories provide speech-to-text functionality, but they differ in implementation and usage. DeepSpeech offers a more comprehensive solution with support for multiple languages, while Silero Models focuses on lightweight, efficient models. DeepSpeech may be better suited for large-scale projects with diverse language requirements, whereas Silero Models could be preferable for applications where speed and resource efficiency are crucial. The choice between the two depends on specific project needs, available computational resources, and desired features.

espnet

9,572

End-to-End Speech Processing Toolkit

Pros of ESPnet

Comprehensive toolkit with support for various speech processing tasks (ASR, TTS, speech enhancement, etc.)
Extensive documentation and active community support
Flexible architecture allowing for easy customization and experimentation

Cons of ESPnet

Steeper learning curve due to its extensive feature set
Potentially higher computational requirements for training and inference
More complex setup process compared to simpler alternatives

Code Comparison

ESPnet example (ASR training):

from espnet2.bin.asr_train import main

args = {
    "output_dir": "exp/asr_train",
    "max_epoch": 100,
    "batch_size": 32,
    "accum_grad": 2,
    "use_amp": True
}

main(args)

Silero Models example (ASR inference):

import torch
from silero import models

model, decoder, utils = models.silero_stt(language='en', device='cpu')
audio_path = 'audio.wav'
transcription = model(audio_path)

The ESPnet code showcases its flexibility in training configuration, while Silero Models demonstrates simplicity in inference. ESPnet offers more control over the training process, whereas Silero Models provides a straightforward API for quick deployment.

kaldi

15,211

kaldi-asr/kaldi is the official location of the Kaldi project.

Pros of Kaldi

Comprehensive toolkit with extensive documentation and examples
Highly flexible and customizable for various ASR tasks
Large community support and active development

Cons of Kaldi

Steeper learning curve and more complex setup
Requires more computational resources for training and inference
Less suitable for quick prototyping or small-scale projects

Code Comparison

Silero-models (Python):

import torch
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt')
wav_path = 'audio.wav'
transcription = model.transcribe(wav_path)

Kaldi (Shell script):

#!/bin/bash
. ./path.sh
compute-mfcc-feats --config=conf/mfcc.conf scp:data/test/wav.scp ark:- | \
  apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 ark:- ark:- | \
  gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0 --acoustic-scale=0.083333 \
  --allow-partial=true --word-symbol-table=exp/tri4b/graph/words.txt \
  exp/tri4b/final.mdl exp/tri4b/graph/HCLG.fst ark:- ark:- | \
  lattice-best-path --word-symbol-table=exp/tri4b/graph/words.txt ark:- ark,t:transcription.txt

fairseq

31,926

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Comprehensive toolkit for sequence modeling tasks
Supports a wide range of architectures and tasks
Extensive documentation and community support

Cons of fairseq

Steeper learning curve due to complexity
Requires more computational resources
Less focused on specific speech-to-text tasks

Code Comparison

silero-models:

import torch
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt')

fairseq:

from fairseq.models.wav2vec import Wav2VecModel
model = Wav2VecModel.from_pretrained('/path/to/model')

Key Differences

silero-models focuses on speech recognition, while fairseq covers a broader range of sequence modeling tasks
silero-models offers simpler integration and usage, while fairseq provides more flexibility and customization options
fairseq has a larger codebase and more dependencies, whereas silero-models is more lightweight and easier to deploy

Use Cases

Choose silero-models for quick implementation of speech recognition tasks
Opt for fairseq when working on complex sequence modeling projects or research requiring extensive customization

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

header

Silero Models

Silero Models

Our TTS models satisfy the following criteria:

Fully end-to-end;
Large library of voices;
Natural-sounding speech;
One-line usage, minimal, portable;
Impressively fast on CPU and GPU;
For the Russian language - automated stress and homographs;

Installation and Basics

You can basically use our models in 3 flavours:

Via PyTorch Hub: torch.hub.load();
Via pip: pip install silero and then from silero import silero_tts;
Via caching the required models and utils manually and modifying if necessary;

Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.

PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:

from silero import silero_tts
model, example_text = silero_tts(language='ru',
                                 speaker='v5_ru')
audio = model.apply_tts(text=example_text)

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

V5

V5 models support SSML. Also see Colab examples for main SSML tag usage.

Russian-only models support automated stress and homographs.

ID	Speakers	Auto-stress / Homographs	Language	SR	Colab
`v5_ru`	`aidar`, `baya`, `kseniya`, `xenia`, `eugene`	yes / yes	`ru` (Russian)	`8000`, `24000`, `48000`

V5 CIS Base Models

All of the below models support 8000, 24000, 48000 sampling rates and contain no auto-stress or homographs;
v5_cis_base models assume that proper stress should be added for each word for all languages, i.e. Ðº+Ð¾ÑÐºÐ°;
v5_cis_base_nostress models assume that proper stress should be added for each word ONLY for slavic languages (i.e. ru, bel, ukr);
All of the below models are published under MIT licence;
V5 UTMOS and throughput metrics;
V5 models support SSML. Also see Colab examples for main SSML tag usage.

ID	Speakers	Language
`v5_cis_base`, `v5_cis_base_nostress`	`aze_gamat`	`aze` (Azerbaijani)
`v5_cis_base`, `v5_cis_base_nostress`	`hye_zara`	`hye` (Armenian)
`v5_cis_base`, `v5_cis_base_nostress`	`bak_aigul`, `bak_alfia`, `bak_alfia2`	`bak` (Bashkir)
`v5_cis_base`, `v5_cis_base_nostress`	`bak_miyau`, `bak_ramilia`	`bak` (Bashkir)
`v5_cis_base`, `v5_cis_base_nostress`	`bel_anatoliy`, `bel_dmitriy`, `bel_larisa`	`bel` (Belarus)
`v5_cis_base`, `v5_cis_base_nostress`	`kat_vika`	`kat` (Georgian)
`v5_cis_base`, `v5_cis_base_nostress`	`kbd_eduard`	`kbd` (Kab.-Cherkes)
`v5_cis_base`, `v5_cis_base_nostress`	`kaz_zhadyra`, `kaz_zhazira`	`kaz` (Kazakh)
`v5_cis_base`, `v5_cis_base_nostress`	`xal_kejilgan`, `xal_kermen`	`xal` (Kalmyk)
`v5_cis_base`, `v5_cis_base_nostress`	`kir_nurgul`	`kir` (Kyrgyz)
`v5_cis_base`, `v5_cis_base_nostress`	`mdf_oksana`	`mdf` (Moksha)
`v5_cis_base`, `v5_cis_base_nostress`	all of these speakers, but with `ru_` prefix	`ru` (Russian)
`v5_cis_base`, `v5_cis_base_nostress`	`tgk_onaoy`, `tgk_safarhuja`	`tgk` (Tajik)
`v5_cis_base`, `v5_cis_base_nostress`	`tat_albina`, `tat_marat`	`tat` (Tatar)
`v5_cis_base`, `v5_cis_base_nostress`	`udm_bogdan`	`udm` (Udmurt)
`v5_cis_base`, `v5_cis_base_nostress`	`uzb_saida`	`uzb` (Uzbek)
`v5_cis_base`, `v5_cis_base_nostress`	`ukr_igor`, `ukr_roman`	`ukr` (Ukrainian)
`v5_cis_base`, `v5_cis_base_nostress`	`kjh_karina`, `kjh_sibday`	`kjh` (Khakas)
`v5_cis_base`, `v5_cis_base_nostress`	`chv_ekaterina`	`chv` (Chuvash)
`v5_cis_base`, `v5_cis_base_nostress`	`erz_alexandr`	`erz` (Erzya)
`v5_cis_base`, `v5_cis_base_nostress`	`sah_zinaida`	`sah` (Yakut)

V5 CIS Ext Models

All of the below models support 8000, 24000, 48000 sampling rates and contain no auto-stress or homographs;
v5_cis_ext models assume that proper stress should be added for each word for all languages, i.e. Ðº+Ð¾ÑÐºÐ°;
v5_cis_ext_nostress are coming soon;
All of the below models are published under CC-NC-BY licence;
V5 models support SSML. Also see Colab examples for main SSML tag usage.

ID	Speakers	Language
`v5_cis_ext`	`kaz_abai`, `kaz_aidana`, `kaz_aisha`, `kaz_bakir`, `kaz_danara`	`kaz` (Kazakh)
`v5_cis_ext`	`xal_delghir`, `xal_erdni`	`xal` (Kalmyk)
`v5_cis_ext`	`tat_adiba`, `tat_alsou`, `tat_amir`, `tat_azat`, `tat_batir`	`tat` (Tatar)
`v5_cis_ext`	`tat_bulat`, `tat_damir`, `tat_guzel`, `tat_ildar`, `tat_ilgiz`	`tat` (Tatar)
`v5_cis_ext`	`tat_karim`, `tat_mansur`, `tat_murat`, `tat_rasima`, `tat_rustem`	`tat` (Tatar)
`v5_cis_ext`	`tat_timur`, `tat_zifa`, `tat_zufar`, `tat_zulfiya`	`tat` (Tatar)
`v5_cis_ext`	`uzb_anora`, `uzb_dilnavoz`	`uzb` (Uzbek)
`v5_cis_ext`	`ukr_kateryna`, `ukr_lada`, `ukr_mykyta`, `ukr_oleksa`, `ukr_tetiana`	`ukr` (Ukrainian)
`v5_cis_ext`	`chv_aihwa`, `chv_alima`	`chv` (Chuvash)

V4

V4 models support SSML. Also see Colab examples for main SSML tag usage.

V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic

ID	Speakers	Auto-stress	Language	SR
`v4_ru`	`aidar`, `baya`, `kseniya`, `xenia`, `eugene`, `random`	yes	`ru` (Russian)	`8000`, `24000`, `48000`
`v4_cyrillic`	`b_ava`, `marat_tt`, `kalmyk_erdni`...	no	`cyrillic` (Avar, Tatar, Kalmyk, ...)	`8000`, `24000`, `48000`
`v4_ua`	`mykyta`, `random`	no	`ua` (Ukrainian)	`8000`, `24000`, `48000`
`v4_uz`	`dilnavoz`	no	`uz` (Uzbek)	`8000`, `24000`, `48000`
`v4_indic`	`hindi_male`, `hindi_female`, ..., `random`	no	`indic` (Hindi, Telugu, ...)	`8000`, `24000`, `48000`

V3

V3 models support SSML. Also see Colab examples for main SSML tag usage.

V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic

ID	Speakers	Auto-stress	Language	SR
`v3_en`	`en_0`, `en_1`, ..., `en_117`, `random`	no	`en` (English)	`8000`, `24000`, `48000`
`v3_en_indic`	`tamil_female`, ..., `assamese_male`, `random`	no	`en` (English)	`8000`, `24000`, `48000`
`v3_de`	`eva_k`, ..., `karlsson`, `random`	no	`de` (German)	`8000`, `24000`, `48000`
`v3_es`	`es_0`, `es_1`, `es_2`, `random`	no	`es` (Spanish)	`8000`, `24000`, `48000`
`v3_fr`	`fr_0`, ..., `fr_5`, `random`	no	`fr` (French)	`8000`, `24000`, `48000`
`v3_indic`	`hindi_male`, `hindi_female`, ..., `random`	no	`indic` (Hindi, Telugu, ...)	`8000`, `24000`, `48000`

Dependencies

Basic dependencies for Colab examples:

torch, 1.10+ for v3 models/ 2.0+ for v4 and v5 models;
torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

# V5
import torch

language = 'ru'
model_id = 'v5_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)

Standalone Use

Standalone usage only requires PyTorch 1.12+ and the Python Standard Library;
Please see the detailed examples in Colab;

# V5
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_text = 'ÐÐµÐ½Ñ Ð·Ð¾Ð²ÑÑ ÐÐµÐ²Ð° ÐÐ¾ÑÐ¾Ð»ÐµÐ². Ð¯ Ð¸Ð· Ð³Ð¾ÑÐ¾Ð². Ð Ñ ÑÐ¶Ðµ Ð³Ð¾ÑÐ¾Ð² Ð¾ÑÐºÑÑÑÑ Ð²ÑÐµ Ð²Ð°ÑÐ¸ Ð·Ð°Ð¼ÐºÐ¸ Ð»ÑÐ±Ð¾Ð¹ ÑÐ»Ð¾Ð¶Ð½Ð¾ÑÑÐ¸!'
sample_rate = 48000
speaker='baya'

audio_paths = model.save_wav(text=example_text,
                             speaker=speaker,
                             sample_rate=sample_rate)

SSML

Check out our TTS Wiki page.

Cyrillic languages v4

To be superseded with v5 model(s) soon.

Speaker_ID	Language	Gender
b_ava	Avar	F
b_bashkir	Bashkir	M
b_bulb	Bulgarian	M
b_bulc	Bulgarian	M
b_che	Chechen	M
b_cv	Chuvash	M
cv_ekaterina	Chuvash	F
b_myv	Erzya	M
b_kalmyk	Kalmyk	M
b_krc	Karachay-Balkar	M
kz_M1	Kazakh	M
kz_M2	Kazakh	M
kz_F3	Kazakh	F
kz_F1	Kazakh	F
kz_F2	Kazakh	F
b_kjh	Khakas	F
b_kpv	Komi-Ziryan	M
b_lez	Lezghian	M
b_mhr	Mari	F
b_mrj	Mari High	M
b_nog	Nogai	F
b_oss	Ossetic	M
b_ru	Russian	M
b_tat	Tatar	M
marat_tt	Tatar	M
b_tyv	Tuvinian	M
b_udm	Udmurt	M
b_uzb	Uzbek	M
b_sah	Yakut	M
kalmyk_erdni	Kalmyk	M
kalmyk_delghir	Kalmyk	F

Indic languages v4

Example

(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:

# V3
import torch
from aksharamukha import transliterate

# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language='indic',
                                     speaker='v4_indic')

orig_text = "à¤ªà¥à¤°à¤¸à¤¿à¤¦à¥à¤¦ à¤à¤¬à¥à¤° à¤à¤§à¥à¤¯à¥à¤¤à¤¾, à¤ªà¥à¤°à¥à¤·à¥à¤¤à¥à¤¤à¤® à¤à¤à¥à¤°à¤µà¤¾à¤² à¤à¤¾ à¤¯à¤¹ à¤¶à¥à¤§ à¤à¤²à¥à¤, à¤à¤¸ à¤°à¤¾à¤®à¤¾à¤¨à¤à¤¦ à¤à¥ à¤à¥à¤ à¤à¤°à¤¤à¤¾ à¤¹à¥"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)

audio = model.apply_tts(roman_text,
                        speaker='hindi_male')

Supported languages

Language	Speakers	Romanization function
hindi	`hindi_female`, `hindi_male`	`transliterate.process('Devanagari', 'ISO', orig_text)`
malayalam	`malayalam_female`, `malayalam_male`	`transliterate.process('Malayalam', 'ISO', orig_text)`
manipuri	`manipuri_female`	`transliterate.process('Bengali', 'ISO', orig_text)`
bengali	`bengali_female`, `bengali_male`	`transliterate.process('Bengali', 'ISO', orig_text)`
rajasthani	`rajasthani_female`, `rajasthani_female`	`transliterate.process('Devanagari', 'ISO', orig_text)`
tamil	`tamil_female`, `tamil_male`	`transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])`
telugu	`telugu_female`, `telugu_male`	`transliterate.process('Telugu', 'ISO', orig_text)`
gujarati	`gujarati_female`, `gujarati_male`	`transliterate.process('Gujarati', 'ISO', orig_text)`
kannada	`kannada_female`, `kannada_male`	`transliterate.process('Kannada', 'ISO', orig_text)`

Contact

Try our models, create an issue, join our chat, email us, and read the latest news.

Licence

All of the models are published under the main repo license (i.e. CC-NC-BY) except for the base cis-tts models, which are under MIT.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained text-to-speech models made embarrassingly simple},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of vosk-api

Cons of vosk-api

Code Comparison

Pros of TTS

Cons of TTS

Code Comparison

Pros of DeepSpeech

Cons of DeepSpeech

Code Comparison

Pros of ESPnet

Cons of ESPnet

Code Comparison

Pros of Kaldi

Cons of Kaldi

Code Comparison

Pros of fairseq

Cons of fairseq

Code Comparison

Key Differences

Use Cases

Convert designs to code with AI

README

Silero Models

Installation and Basics

Text-To-Speech

Models and Speakers

V5

V5 CIS Base Models

V5 CIS Ext Models

V4

V3

Dependencies

PyTorch

Standalone Use

SSML

Cyrillic languages v4

Indic languages v4

Example

Supported languages

Contact

Licence

Citations

Further reading

English

Chinese

Russian

Top Related Projects

Convert designs to code with AI