An AI speech processing toolkit

State of the art speech recognition, text-to-speech, spoken language understanding, and more.

Model Repository

View on GitHub

Robust pre-trained models

Integrate Whisper, wav2vec 2.0, XLS-R, WavLM, HuBERT and more with S3PRL.

git clone https://github.com/espnet/espnet
cd espnet/tools
. ./setup_anaconda.sh anaconda espnet 3.8
make

Seamlessly track and compare experiments with Weights and Biases.

1
2

cd espnet/egs2/librispeech
. ./run.sh

Data downloading and processing pre-handled. Spend more time developing instead of cleaning.

And supports NLP tasks, including Machine Translation and Language Modelling