Automatic Transcription of Air Traffic Controller to Pilot Communication - Training Speech Recognition Models with the Open Source Toolkit CoquiSTT

M. May; M. Kleinert; H. Helmke

doi:10.25967/630171

DGLR-Publikationsdatenbank - Detailansicht

Titel:

Automatic Transcription of Air Traffic Controller to Pilot Communication - Training Speech Recognition Models with the Open Source Toolkit CoquiSTT

Autor(en):

M. May, M. Kleinert, H. Helmke

Zusammenfassung:

Despite all the advances in automation and digitalization the majority of communication between air traffic controllers and pilots is still implemented via analogue radio voice transmissions. If support systems also want to benefit from the verbal controller-pilot-communication, manual time-consuming inputs via mouse and keyboard are required. Automatic speech recognition (ASR) is a solution to minimize these manual inputs. Recently DLR, Idiap and Austro Control demonstrated that pre-filling of radar label entries supported by ASR already reaches a technology readiness level of six. The used ASR engine is based on Kaldi, which requires high expert knowledge of ASR for implementation and adaptation. Besides Kaldi a lot of open-source end-toend ASR models like Whisper or wav2vec are available and are already pre-trained on large amounts of data of normal voice communication. These open source end-to-end models are often easier to adapt even for none speech recognition experts. This paper presents the results, which the DLR achieved with the open-source CoquiSTT toolkit, which provides an already pre-trained English end-to-end model with 47,000 hours of regular English speech achieving a word error rate of 4.5% on the LibriSpeech clean test corpus. Using the model, however, on air traffic control voice communication results in word error rates of worse than 50%, even in lab environments. Training new models from scratch just on 10 hours of voice recordings from the target environment already makes word error rates below 10% possible. The best performance, however, is achieved, when the CoquiSTT pre-trained model is fine-tuned with air traffic control data from different areas. Word error rates below 5% were achieved, which enable, e.g., callsign recognition rates of better than 95%.

Veranstaltung:

Deutscher Luft- und Raumfahrtkongress 2024, Hamburg

Verlag, Ort:

Deutsche Gesellschaft für Luft- und Raumfahrt - Lilienthal-Oberth e.V., Bonn, 2024

Medientyp:

Conference Paper

Sprache:

englisch

Format:

21,0 x 29,7 cm, 8 Seiten

URN:

urn:nbn:de:101:1-2412131300191.469971864854

DOI:

10.25967/630171

Stichworte zum Inhalt:

Automatische Spracherkennung, Fluglotsenassistenz

Verfügbarkeit:

Download - Bitte beachten Sie die Nutzungsbedingungen dieses Dokuments: Copyright protected

Kommentar:

Zitierform:

May, M.; Kleinert, M.; Helmke, H. (2024): Automatic Transcription of Air Traffic Controller to Pilot Communication - Training Speech Recognition Models with the Open Source Toolkit CoquiSTT. Deutsche Gesellschaft für Luft- und Raumfahrt - Lilienthal-Oberth e.V.. (Text). https://doi.org/10.25967/630171. urn:nbn:de:101:1-2412131300191.469971864854.

Veröffentlicht am:

13.12.2024

E-Mail:	info(at)dglr.de
Fon:	0228 308050
Fax:	0228 3080524

DGLR-Publikationsdatenbank - Detailansicht

Titel:

Automatic Transcription of Air Traffic Controller to Pilot Communication - Training Speech Recognition Models with the Open Source Toolkit CoquiSTT

Suche: