Lexical semantic change detection: A supervised approach
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Πανεπιστήμιο Πελοποννήσου
Abstract
The aim of this thesis address the challenge of detecting lexical semantic
change, a task essential for understanding language evolution and its implications
in linguistics, history, and artificial intelligence. As part of this work, we
propose a novel methodology for creating an annotated dataset specifically designed
for lexical semantic change detection. This dataset serves as a critical foundation
for our supervised approach, which leverages word embeddings generated from Skip-
Gram with Negative Sampling (SGNS) models to identify and quantify semantic
shifts across time periods. By combining robust preprocessing techniques, including
undersampling to address class imbalances, with a range of machine learning classifiers,
we demonstrate that supervised learning can effectively be employed for this
task.
Our experiments show that supervised models, particularly the Support Vector
Machine (SVM) classifier with undersampling, outperform traditional unsupervised
methods. The best-performing model achieved an F1-score of 0.7568, surpassing the
top results from the SemEval 2020 task1 unsupervised competition. This validates
the effectiveness of supervised learning in capturing subtle semantic changes and
highlights its potential for addressing similar tasks in the future.
In addition, we are planning to explore the integration of contextual embeddings,
such as those generated by BERT, into the supervised framework, which holds
promise for further enhancing the model’s ability to detect nuanced semantic shifts.
This thesis also outlines several promising directions for future research, including
advanced feature engineering, hyperparameter optimization, and the adoption of
semi-supervised learning techniques to improve performance and scalability.
By introducing a novel annotated dataset and demonstrating the efficacy of supervised
approaches, this work bridges the gap between supervised and unsupervised
methods in lexical semantic change detection. While our results establish a strong
foundation, there remains significant room for improvement and further development
in this field, paving the way for innovative applications in computational linguistics
and beyond.
Description
Μ.Δ.Ε. 140
Keywords
Natural language processing (Computer science), Artificial Intelligence, Data sets, Machine Learning--Classifiers, Supervised learning (Machine learning), Επεξεργασία φυσικής γλώσσας (Πληροφορική), Τεχνητή νοημοσύνη, Σύνολα δεδομένων, Μηχανική Μάθηση--Ταξινομητές, Εποπτευόμενη μάθηση (Μηχανική μάθηση)
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα

