Speech de-identification with deep neural networks

Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (...

Teljes leírás

Elmentve itt :

Bibliográfiai részletek
Szerzők:	Fodor Ádám Kopácsi László Milacski Zoltán Ádám Lőrincz András
Testületi szerző:	Conference of PhD Students in Computer Science (12.) (2020) (Szeged)
Dokumentumtípus:	Cikk
Megjelent:	University of Szeged, Institute of Informatics Szeged 2021
Sorozat:	Acta cybernetica 25 No. 2
Kulcsszavak:	Beszédfeldolgozás, Adatvédelem, Programozás
Tárgyszavak:	Természettudományok Számítás- és információtudomány
doi:	10.14232/actacyb.288282
Online Access:	http://acta.bibl.u-szeged.hu/75609


LEADER	02009nab a2200277 i 4500
001	acta75609
005	20220512144418.0
008	220512s2021 hu o 0\|\| eng d
022			\|a 0324-721X
024	7		\|a 10.14232/actacyb.288282 \|2 doi
040			\|a SZTE Egyetemi Kiadványok Repozitórium \|b hun
041			\|a eng
100	1		\|a Fodor Ádám
245	1	0	\|a Speech de-identification with deep neural networks \|h [elektronikus dokumentum] / \|c Fodor Ádám
260			\|a University of Szeged, Institute of Informatics \|b Szeged \|c 2021
300			\|a 257-269
490	0		\|a Acta cybernetica \|v 25 No. 2
520	3		\|a Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (TTS) system before sending the utterance to the cloud. The network learns to transcode sequences of vocoder parameters, delta and delta-delta features of human speech to those of the TTS engine. We evaluated several TTS systems, vocoders and audio alignment techniques. We measured the performance of our method by (i) comparing the result of speech recognition on the de-identified utterances with the original texts, (ii) computing the Mel-Cepstral Distortion of the aligned TTS and the transcoded sequences, and (iii) questioning human participants in A-not-B, 2AFC and 6AFC tasks. Our approach achieves the level required by diverse applications.
650		4	\|a Természettudományok
650		4	\|a Számítás- és információtudomány
695			\|a Beszédfeldolgozás, Adatvédelem, Programozás
700	0	1	\|a Kopácsi László \|e aut
700	0	1	\|a Milacski Zoltán Ádám \|e aut
700	0	1	\|a Lőrincz András \|e aut
710			\|a Conference of PhD Students in Computer Science (12.) (2020) (Szeged)
856	4	0	\|u http://acta.bibl.u-szeged.hu/75609/1/cybernetica_025_numb_002_257-269.pdf \|z Dokumentum-elérés

Speech de-identification with deep neural networks

Hasonló tételek