Image Anomaly Detection via Local and Global Knowledge Integration

Image Anomaly Detection via Local and Global Knowledge Integration

This study proposes a novel method for high-precision detection of “logical anomalies” (e.g., misplacements or omissions of parts that depend on the overall contextual information of an image) in applications such as industrial inspection. Conventional anomaly detection methods work well for “structural anomalies” (such as cracks or contaminations) that are local in nature, but they struggle to capture the long-range dependencies required to detect logical anomalies accurately.

The paper adopts a reverse distillation framework that uses a pre-trained teacher network (e.g., WideResNet50 trained on ImageNet) and trains exclusively on anomaly-free (normal) images. It extends conventional knowledge distillation by simultaneously training two student networks with distinct roles—a local student and a global student—within the “Dual-Student Knowledge Distillation Framework (DSKD).”

Local Student

The local student is specialized in replicating the low-level local features extracted by the teacher network. By accurately reconstructing fine textures and boundary details, it is well suited for detecting structural anomalies such as cracks, scratches, or color inconsistencies. In practice, the local student is trained to minimize the reconstruction error by using cosine similarity loss computed on the feature maps obtained from various layers of the teacher.

Global Student

In contrast, the global student is designed to capture the overall contextual information and long-range dependencies present in the image. To condense global information, the method introduces a module called the “Global Context Condensing Block (GCCB),” which compresses the teacher network’s high-level features along the channel dimension to extract crucial contextual data. Moreover, the global student evaluates the cosine similarity between the features of each pixel and the overall image features, then converts the resulting similarity list into a probability distribution via a softmax function with a temperature parameter. This mechanism enables the design of a contextual affinity loss—minimized via KL divergence between the teacher and global student—that reinforces the transmission of global contextual information.

Anomaly Scoring and Fusion

During training, both the local and global students generate anomaly score maps for each region of the image. The local student computes anomaly scores from the reconstruction errors of local features, while the global student derives scores from discrepancies in contextual similarity. At inference time, these score maps are appropriately normalized and integrated, allowing the system to accurately detect and localize both structural and logical anomalies.

Experimental Results and Discussion

The proposed method demonstrates superior performance over conventional techniques (such as autoencoders and earlier knowledge distillation-based approaches) in detecting both structural and logical anomalies. Notably, experiments on real-world anomaly detection datasets like MVTec LOCO AD and the modified MVTec AD reveal a significant improvement in the detection accuracy of logical anomalies. Ablation studies further highlight the effectiveness of the distinct roles assigned to the local and global students, as well as the contributions of the GCCB and the contextual affinity loss, showing that the cooperative interaction between the two student networks enhances overall performance.

Conclusion and Future Work

The proposed DSKD method successfully distills knowledge from a teacher network into two specialized student networks by training exclusively on normal images. This approach effectively leverages both local features and global contextual information to detect logical anomalies that have been challenging for previous methods. With its simple network architecture and innovative loss functions, the method achieves state-of-the-art performance, suggesting its potential for practical applications in industrial inspection and quality control. Future work will focus on applying this approach to even smaller anomalies and more complex scenarios, as well as further refining the network architecture and training strategies.

This study, by effectively learning both local reconstructions and global contextual cues, establishes a new approach for achieving high detection performance on diverse types of anomalies.

Publication

Zhang, Jie, Masanori Suganuma, and Takayuki Okatani. “Contextual affinity distillation for image anomaly detection.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024.

@inproceedings{zhang2024contextual,
title={Contextual affinity distillation for image anomaly detection},
author={Zhang, Jie and Suganuma, Masanori and Okatani, Takayuki},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={149--158},
year={2024}
}
Categories: Uncategorized