Joint optimization of spectro-temporal features and deep neural nets for robust automatic speech recognition

In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier t...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Kovács György Tóth László
Dokumentumtípus:	Cikk
Megjelent:	2015
Sorozat:	Acta cybernetica 22 No. 1
Kulcsszavak:	Számítógép alkalmazása - beszédfelismerés
Tárgyszavak:	Természettudományok Számítás- és információtudomány
doi:	10.14232/actacyb.22.1.2015.8
Online Access:	http://acta.bibl.u-szeged.hu/36260


LEADER	01910nab a2200241 i 4500
001	acta36260
005	20220620105219.0
008	161017s2015 hu o 0\|\| eng d
022			\|a 0324-721X
024	7		\|a 10.14232/actacyb.22.1.2015.8 \|2 doi
040			\|a SZTE Egyetemi Kiadványok Repozitórium \|b hun
041			\|a eng
100	1		\|a Kovács György
245	1	0	\|a Joint optimization of spectro-temporal features and deep neural nets for robust automatic speech recognition \|h [elektronikus dokumentum] / \|c Kovács György
260			\|c 2015
300			\|a 117-134
490	0		\|a Acta cybernetica \|v 22 No. 1
520	3		\|a In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we propose two further improvements to our method based on recent advances in neural net technology and extend our evaluation to speech contaminated with new types of noise. By repeating our experiments on TIMIT phone recognition tasks using clean and noise contaminated speech, we can compare the recognition performance of the original framework with our new, modified framework. The results indicate that both these modifications significantly improve the recognition performance of our framework. Moreover, we will show that these modifications allow us to achieve a substantially better performance than what we got earlier.
650		4	\|a Természettudományok
650		4	\|a Számítás- és információtudomány
695			\|a Számítógép alkalmazása - beszédfelismerés
700	0	1	\|a Tóth László \|e aut
856	4	0	\|u http://acta.bibl.u-szeged.hu/36260/1/actacyb_22_1_2015_8.pdf \|z Dokumentum-elérés

Joint optimization of spectro-temporal features and deep neural nets for robust automatic speech recognition

Hasonló tételek