Using the Bag-of-Audio-Words approach for emotion recognition

The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-wor...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Kiss-Vetráb Mercedes Gosztolya Gábor
Dokumentumtípus:	Cikk
Megjelent:	2022
Sorozat:	ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA 14 No. 1
Tárgyszavak:	Számítás- és információtudomány
doi:	10.2478/ausi-2022-0001
mtmt:	33055336
Online Access:	http://publicatio.bibl.u-szeged.hu/25199


LEADER	02068nab a2200229 i 4500
001	publ25199
005	20220927081712.0
008	220927s2022 hu o 0\|\| Angol d
022			\|a 1844-6086
024	7		\|a 10.2478/ausi-2022-0001 \|2 doi
024	7		\|a 33055336 \|2 mtmt
040			\|a SZTE Publicatio Repozitórium \|b hun
041			\|a Angol
100	2		\|a Kiss-Vetráb Mercedes
245	1	0	\|a Using the Bag-of-Audio-Words approach for emotion recognition \|h [elektronikus dokumentum] / \|c Kiss-Vetráb Mercedes
260			\|c 2022
300			\|a 1-21
490	0		\|a ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA \|v 14 No. 1
520	3		\|a The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.
650		4	\|a Számítás- és információtudomány
700	0	1	\|a Gosztolya Gábor \|e aut
856	4	0	\|u http://publicatio.bibl.u-szeged.hu/25199/1/2022-acta-sapientiae.pdf \|z Dokumentum-elérés

Using the Bag-of-Audio-Words approach for emotion recognition

Hasonló tételek