Incorporating sparse overcomplete word representations into transition-based dependency parsing

A transition-based dependency parser is utilized in the work related by this document. Originally, the parser aims to be used with cross-lingual functionality, nevertheless, it can be used for monolingual dependency parsing as well. Basically, the parser uses different kind of distributed representa...

Full description

Saved in:
Bibliographic Details
Main Author: Egas López José Vicente
Other Authors: Berend Gábor
Format: Thesis
Published: 2018
Subjects:
Online Access:http://diploma.bibl.u-szeged.hu/73539
Description
Summary:A transition-based dependency parser is utilized in the work related by this document. Originally, the parser aims to be used with cross-lingual functionality, nevertheless, it can be used for monolingual dependency parsing as well. Basically, the parser uses different kind of distributed representations. Here, nothing but the monolingual setting was used for the comparison between the performances of the parser when using different kind of obtained word embeddings, i.e. sparse overcomplete vectorial embeddings versus dense vectorial embeddings, respectively. On the other hand, for achieving a more discernable comparison of the results, the consumption of distributed representations that are different from word embeddings is going to be omitted due to the utilization of the parser’s monolingual functionality merely. Thus, the parser ran consuming alternative word representations, namely, just one type of distributed representations (word embeddings) are destined to be utilized for the experiments. Training and testing experiments were realized with Universal Dependencies version 2, that is, CoNLL-U formatted datasets for training, development and testing were fed to the parser together with the pre-trained word embeddings. The results of this work point that the performance of sparse embeddings are extremely close to the one of the dense embeddings. The experiments yield that sparse embeddings achieved 85.76% and 83.5%, in contrast with 86.18% and 84.08% for the dense embeddings, in Unlabeled Attachment Score in Labeled Attachment Score, respectively. The usage of sparse embeddings fallouts in near to the state-of-the-art methods performance; plus, mentioned embeddings are more interpretable for humans. That is, their dimensions are more coherent than the dimensions within dense embeddings; word intrusion experiments done by (Murphy et al., 2012) and (Faruqui, 2015) corroborate the highly interpretability of sparse embeddings.