Video Content Similarity Detection

Μάλφα, Ίλια-Αικατερίνη

Video Content Similarity Detection

Files

Primary ΜΑΛΦΑ_2022202004011.pdf (1.83 MB)

Date

2024-09-30

Authors

Μάλφα, Ίλια-Αικατερίνη

Publisher

Πανεπιστήμιο Πελοποννήσου

Abstract

This master thesis explores the application of advanced machine learning techniques for detecting video content similarity, an increasingly important task in the era of large-scale multimedia data. Traditional methods often struggle to effectively handle the complexity of video data, which contains both visual and auditory components. This study leverages embedding models, to represent these components as compact and dense vectors, enabling more efficient and accurate similarity detection. The anime series Mushishi serves as the dataset for this research, providing a consistent structure and rich audiovisual content for detailed analysis. The thesis applies a range of similarity metrics, including cosine similarity and Euclidean distance, to compare embeddings across different episodes and segments of the series. A key innovation of this study is the integration of both image and audio embeddings to improve the detection of content similarity. By combining these modalities, the research demonstrates that a multimodal approach significantly enhances accuracy compared to single-modality models, especially in segments where both visual and auditory features play critical roles in defining similarity. The findings of this research offer valuable insights into the performance of embedding models in multimedia content analysis. The study highlights both the strengths and limitations of the models tested, with CLIP excelling in visual feature extraction and Wav2Vec2 capturing auditory nuances. The combined audio-visual approach opens new possibilities for more robust and scalable systems in fields such as content recommendation, copyright protection, and video retrieval. This thesis contributes a framework that can be further expanded to handle a variety of multimedia content.

Description

Μ.Δ.Ε. 130

Keywords

Machine Learning, Μηχανική Μάθηση, Multimedia data mining, Πολυμέσα, Εξόρυξη δεδομένων, Audio-visual

URI

https://amitos.library.uop.gr/xmlui/handle/123456789/8808

Collections

Τμήμα Πληροφορικής και Τηλεπικοινωνιών (Μ. Δ. Ε.)

Full item page

Video Content Similarity Detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By