Conformer2

Accurately transcribed spoken language.

Released on July 20, 2023

speech

recognition

english

Overview

Conformer-2 is an advanced AI model designed for automatic speech recognition. It has been trained on 1.1 million hours of English audio data, resulting in significant improvements over its predecessor, Conformer-1.

This model focuses on enhancing the recognition of proper nouns, alphanumerics, and noise robustness.The development of Conformer-2 was driven by the scaling laws proposed in DeepMind's Chinchilla paper, which highlighted the importance of sufficient training data for large language models.

Consequently, Conformer-2 has been trained on a substantial amount of data, utilizing 1.1 million hours of English audio.One notable feature of Conformer-2 is its adoption of model ensembling.

Instead of relying on predictions from a single teacher model, Conformer-2 generates labels from multiple strong teachers. This ensembling technique reduces variance and enhances the model's performance when faced with unseen data during training.Despite the increased model size, Conformer-2 offers improvements in terms of speed compared to Conformer-1.

The serving infrastructure has been optimized to ensure faster processing times, achieving up to a 55% reduction in relative processing duration across all audio file durations.In real-world applications, Conformer-2 demonstrates significant enhancements in various user-oriented metrics.

It achieves a 31.7% improvement on alphanumerics, a 6.8% improvement on proper noun error rate, and a 12.0% improvement in noise robustness. These improvements are a result of both increased training data and the use of an ensemble of models.The Conformer-2 model is ideal for generating accurate speech-to-text transcriptions, making it a valuable component for AI pipelines focused on generative AI applications that utilize spoken data.

Featured AI Tools

Comments

No comments found

Page 1 of 0