Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images

Published in 2024 International Joint Conference on Neural Networks (IJCNN), 2024

In this work, we leverage self-supervised pre-trained Vision Transformers (ViTs) and their self-attention maps to build a novel anomaly detection algorithm. The method, SAMSAD (Self-Attention MapS for Anomaly Detection), leverages the ability of pre-training procedures to obtain implicit images’ semantic segmentation masks. From this organization of features, a three-step procedure is proposed based on fine-tuning, clustering, and anomaly scoring. The algorithm achieves state-of-the-art performances of anomaly detection and segmentation in industrial product images, scaling to higher-resolution images with more effectiveness than its competitors.

Recommended citation: Samele, S., Matteucci, M. (2024). "Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images." 2024 International Joint Conference on Neural Networks (IJCNN). pp. 1–10.
Paper Link | Download Bibtex

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Stefano Samele

Share on