Speech de-identification with deep neural networks
Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (...
Elmentve itt :
Szerzők: | |
---|---|
Testületi szerző: | |
Dokumentumtípus: | Cikk |
Megjelent: |
University of Szeged, Institute of Informatics
Szeged
2021
|
Sorozat: | Acta cybernetica
25 No. 2 |
Kulcsszavak: | Beszédfeldolgozás, Adatvédelem, Programozás |
Tárgyszavak: | |
doi: | 10.14232/actacyb.288282 |
Online Access: | http://acta.bibl.u-szeged.hu/75609 |
LEADER | 02009nab a2200277 i 4500 | ||
---|---|---|---|
001 | acta75609 | ||
005 | 20220512144418.0 | ||
008 | 220512s2021 hu o 0|| eng d | ||
022 | |a 0324-721X | ||
024 | 7 | |a 10.14232/actacyb.288282 |2 doi | |
040 | |a SZTE Egyetemi Kiadványok Repozitórium |b hun | ||
041 | |a eng | ||
100 | 1 | |a Fodor Ádám | |
245 | 1 | 0 | |a Speech de-identification with deep neural networks |h [elektronikus dokumentum] / |c Fodor Ádám |
260 | |a University of Szeged, Institute of Informatics |b Szeged |c 2021 | ||
300 | |a 257-269 | ||
490 | 0 | |a Acta cybernetica |v 25 No. 2 | |
520 | 3 | |a Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (TTS) system before sending the utterance to the cloud. The network learns to transcode sequences of vocoder parameters, delta and delta-delta features of human speech to those of the TTS engine. We evaluated several TTS systems, vocoders and audio alignment techniques. We measured the performance of our method by (i) comparing the result of speech recognition on the de-identified utterances with the original texts, (ii) computing the Mel-Cepstral Distortion of the aligned TTS and the transcoded sequences, and (iii) questioning human participants in A-not-B, 2AFC and 6AFC tasks. Our approach achieves the level required by diverse applications. | |
650 | 4 | |a Természettudományok | |
650 | 4 | |a Számítás- és információtudomány | |
695 | |a Beszédfeldolgozás, Adatvédelem, Programozás | ||
700 | 0 | 1 | |a Kopácsi László |e aut |
700 | 0 | 1 | |a Milacski Zoltán Ádám |e aut |
700 | 0 | 1 | |a Lőrincz András |e aut |
710 | |a Conference of PhD Students in Computer Science (12.) (2020) (Szeged) | ||
856 | 4 | 0 | |u http://acta.bibl.u-szeged.hu/75609/1/cybernetica_025_numb_002_257-269.pdf |z Dokumentum-elérés |