Lexical semantic change detection: A supervised approach

Τσαρούχας, Νικόλαος-Μάριος

Lexical semantic change detection: A supervised approach

Files

Tsarouchas_Nikolaos_Marios_Lexical_semantic_change_detection.pdf (1.01 MB)

Date

2024-12-12

Authors

Τσαρούχας, Νικόλαος-Μάριος

Publisher

Πανεπιστήμιο Πελοποννήσου

Abstract

The aim of this thesis address the challenge of detecting lexical semantic change, a task essential for understanding language evolution and its implications in linguistics, history, and artificial intelligence. As part of this work, we propose a novel methodology for creating an annotated dataset specifically designed for lexical semantic change detection. This dataset serves as a critical foundation for our supervised approach, which leverages word embeddings generated from Skip- Gram with Negative Sampling (SGNS) models to identify and quantify semantic shifts across time periods. By combining robust preprocessing techniques, including undersampling to address class imbalances, with a range of machine learning classifiers, we demonstrate that supervised learning can effectively be employed for this task. Our experiments show that supervised models, particularly the Support Vector Machine (SVM) classifier with undersampling, outperform traditional unsupervised methods. The best-performing model achieved an F1-score of 0.7568, surpassing the top results from the SemEval 2020 task1 unsupervised competition. This validates the effectiveness of supervised learning in capturing subtle semantic changes and highlights its potential for addressing similar tasks in the future. In addition, we are planning to explore the integration of contextual embeddings, such as those generated by BERT, into the supervised framework, which holds promise for further enhancing the model’s ability to detect nuanced semantic shifts. This thesis also outlines several promising directions for future research, including advanced feature engineering, hyperparameter optimization, and the adoption of semi-supervised learning techniques to improve performance and scalability. By introducing a novel annotated dataset and demonstrating the efficacy of supervised approaches, this work bridges the gap between supervised and unsupervised methods in lexical semantic change detection. While our results establish a strong foundation, there remains significant room for improvement and further development in this field, paving the way for innovative applications in computational linguistics and beyond.