Neural morphological generators for Hungarian
Here we present a set of morphological generators for Hungarian that generate surface forms from emMorph and Universal Dependencies (UD) morphological tags with high accuracy. We experimented with two approaches: first, neural machine translation models were trained based on the morphological analys...
Elmentve itt :
Szerzők: | |
---|---|
Testületi szerző: | |
Dokumentumtípus: | Könyv része |
Megjelent: |
2023
|
Sorozat: | Magyar Számítógépes Nyelvészeti Konferencia
19 |
Kulcsszavak: | Nyelvészet - számítógép alkalmazása |
Tárgyszavak: | |
Online Access: | http://acta.bibl.u-szeged.hu/78423 |
Tartalmi kivonat: | Here we present a set of morphological generators for Hungarian that generate surface forms from emMorph and Universal Dependencies (UD) morphological tags with high accuracy. We experimented with two approaches: first, neural machine translation models were trained based on the morphological analysis as the source format and the corresponding surface form as the target format. Second, we tackled the problem as a text generation task, where the morphological analysis is followed by the correct word form. The corpus we used is a normalised version of Webcorpus 2.0 (Nemeskey, 2020). Marian MT proved to produce the best results, thus we evaluated its output manually on NerKor (Simon and Vadász, 2021). Our analysis shows that the generator achieves a high accuracy of 96.27% in the case of emMorph and 94.94% in the case of UD. After manual evaluation, we counted a more concise accuracy, which is 99.43% (emMorph) and 98.69% (UD). This model may be used for several NLP tasks, such as anonymisation and terminology translation. |
---|---|
Terjedelem/Fizikai jellemzők: | 331-340 |
ISBN: | 978-963-306-912-7 |