Video Content Similarity Detection
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Πανεπιστήμιο Πελοποννήσου
Abstract
This master thesis explores the application of advanced machine learning techniques for
detecting video content similarity, an increasingly important task in the era of large-scale
multimedia data. Traditional methods often struggle to effectively handle the complexity
of video data, which contains both visual and auditory components. This study leverages
embedding models, to represent these components as compact and dense vectors, enabling
more efficient and accurate similarity detection. The anime series Mushishi serves as the
dataset for this research, providing a consistent structure and rich audiovisual content
for detailed analysis.
The thesis applies a range of similarity metrics, including cosine similarity and Euclidean
distance, to compare embeddings across different episodes and segments of the
series. A key innovation of this study is the integration of both image and audio embeddings
to improve the detection of content similarity. By combining these modalities, the
research demonstrates that a multimodal approach significantly enhances accuracy compared
to single-modality models, especially in segments where both visual and auditory
features play critical roles in defining similarity.
The findings of this research offer valuable insights into the performance of embedding
models in multimedia content analysis. The study highlights both the strengths and
limitations of the models tested, with CLIP excelling in visual feature extraction and
Wav2Vec2 capturing auditory nuances. The combined audio-visual approach opens new
possibilities for more robust and scalable systems in fields such as content recommendation,
copyright protection, and video retrieval. This thesis contributes a framework that
can be further expanded to handle a variety of multimedia content.
Description
Μ.Δ.Ε. 130
