Unsupervised Domain Adaptation for Semantic Segmentation

Unsupervised Domain Adaptation for Semantic Segmentation

This paper proposes a novel method called “Cross-Region Adaptation (CRA)” aimed at improving the accuracy of unsupervised domain adaptation (UDA) for semantic segmentation. Semantic segmentation, which assigns semantic labels to each pixel in an image, is a critical task that requires a large amount of annotated data for high-precision learning. However, annotating real-world images is often challenging, and models trained on synthetic images tend to suffer from distribution shifts (domain shift) when applied to real images. To address this issue, conventional approaches have attempted to align the overall feature distributions using adversarial training or to leverage pseudo-labels through self-training. Nonetheless, these methods still exhibit residual class-level mismatches that limit their ultimate accuracy.

Related Work

Traditional UDA methods aim to align the feature distributions between the source domain (annotated data) and the target domain (unannotated data) by mainly utilizing two approaches: adversarial training and self-training. In adversarial training, a domain discriminator is employed to make the features of both domains indistinguishable; however, this often leaves fine-grained discrepancies at the class level. Conversely, self-training involves using a model trained on the source domain to generate pseudo-labels for the target domain, which are then used for further training. Yet, the potential inaccuracy of these pseudo-labels can lead to degraded performance.

Proposed Method

The CRA method proposed in this study addresses the shortcomings of the previous approaches by focusing on class-level mismatches within the target domain. Specifically, a segmentation model obtained from existing UDA methods is used to compute the entropy from the output probabilities of each pixel in a target image. This allows the image to be divided into regions of high uncertainty (untrusted) and regions deemed reliable (trusted). The trusted regions are assumed to have correct labels, as they are refined through self-training with pseudo-labels. In contrast, the untrusted regions, which are more prone to misclassification, are aligned with the trusted regions via adversarial training in the feature space. This approach is expected to reassign misclassified pixels to the correct side of the class boundaries, thereby enhancing overall segmentation accuracy.

Experimental Results

The proposed method was evaluated on the domain adaptation task from synthetic datasets such as GTA5, SYNTHIA, and Synscapes to the real-world Cityscapes dataset. Results consistently demonstrated that CRA yields performance improvements over existing UDA methods. Moreover, the effectiveness of CRA was validated using various backbone networks (e.g., VGG16, ResNet101, and transformer-based architectures like DAFormer), with each combination resulting in an increased mean IoU. Detailed analyses—including adjustments of the entropy threshold and handling of rare classes—as well as feature space visualization experiments, further confirmed that aligning the trusted and untrusted regions substantially contributes to the enhancement of segmentation accuracy.

Conclusion

This paper presents CRA, a method that addresses the challenge of class-level mismatches in unsupervised domain adaptation by dividing target images into trusted and untrusted regions and aligning their respective feature distributions via adversarial training. Experimental results demonstrate that when combined with existing UDA methods, CRA reliably leads to significant performance improvements, establishing it as an effective approach for enhancing semantic segmentation accuracy. Furthermore, the proposed method holds promise for further improvements through integration with advanced network architectures and learning strategies, thereby broadening the potential for utilizing unannotated data in real-world applications.

Publication

Wang, Zhijie, et al. “Unsupervised domain adaptation for semantic segmentation via cross-region alignment.” Computer Vision and Image Understanding 234 (2023): 103743.

@article{wang2023unsupervised,
title={Unsupervised domain adaptation for semantic segmentation via cross-region alignment},
author={Wang, Zhijie and Liu, Xing and Suganuma, Masanori and Okatani, Takayuki},
journal={Computer Vision and Image Understanding},
volume={234},
pages={103743},
year={2023},
publisher={Elsevier}
}
Categories: Uncategorized