The impact of different representations in the presence of language drift

Christodoulou, Ioannis

The impact of different representations in the presence of language drift

dc.contributor.advisor	Zavitsanos, Elias
dc.contributor.author	Christodoulou, Ioannis
dc.contributor.committee	Krithara, Anastasia
dc.contributor.committee	Giannakopoulos, George
dc.contributor.department	Τμήμα Πληροφορικής και Τηλεπικοινωνιών	el
dc.contributor.faculty	Σχολή Οικονομίας και Τεχνολογίας	el
dc.contributor.master	Επιστήμη Δεδομένων	el
dc.date.accessioned	2024-09-05T10:17:02Z
dc.date.available	2024-09-05T10:17:02Z
dc.date.issued	2022-07
dc.description	Μ.Δ.Ε. 104	el
dc.description.abstract	Natural language inherently contains an interpretation of the world in the form of vocabulary and the different meanings of words. Language changes can reflect sociocultural evolution; therefore, their systematical exploration is a valuable tool to social and humanities sciences researchers. In this thesis, we examine the detection of semantic changes between two time periods t1, t2. For the empirical study, we use datasets of four different languages (English, German, Latin, and Swedish) provided from the SemEval-2020 Task 1. The whole set of our experiments is evaluated against a binary classification task, depending on whether a word's sense changes or not. For that purpose, we explore a set of different approaches including methods that have not been previously submitted in the SemEval-2020 Task 1. Furthermore, we create an extensible system which decouples each stage of the diachronic semantic change detection workflow from the actual implementations. This approach contributes to a quick and efficient reproduction of the experiments, aiming to facilitate research in the domain of semantic change. Based on the results of our empirical study, we answer three different questions. The first is related to identifying the most suitable alignment method for the word embeddings Wt1, Wt2. The methods under investigation are the Orthogonal Procrustes, the Incremental Training, and the Temporal Word Embeddings with a Compass. The next question refers to the performance of the Word2vec pre-trained embeddings compared to others whose weights had not been prior initialized. Finally, through the application of LDA2vec, we explore whether the LDA (Latent Dirichlet Allocation) topics improve the performance of the SGNS (Skip-gram with Negative Sampling) or not.	el
dc.format.extent	σελ. 64	el
dc.identifier.uri	https://amitos.library.uop.gr/xmlui/handle/123456789/8226
dc.identifier.uri	http://dx.doi.org/10.26263/amitos-1728
dc.language.iso	en	el
dc.publisher	Πανεπιστήμιο Πελοποννήσου	el
dc.subject.keyword	lda2vec	el
dc.subject.keyword	word2vec	el
dc.subject.keyword	semantic change	el
dc.subject.keyword	twec	el
dc.subject.keyword	orthogonal procrustes	el
dc.subject.keyword	local neighborhood	el
dc.subject.keyword	diachronic	el
dc.subject.keyword	semEval	el
dc.title	The impact of different representations in the presence of language drift	el
dc.type	Μεταπτυχιακή διπλωματική εργασία	el

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MScThesis_Christodoulou_Giannis.pdf
Size:: 756.53 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 933 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Τμήμα Πληροφορικής και Τηλεπικοινωνιών (Μ. Δ. Ε.)