Revisiting the Importance of Amplifying Bias for Debiasing

이정수(카이스트, 카카오엔터프라이즈), 박정훈(카이스트, 카카오엔터프라이즈), 김대영(카이스트), 이주영(카카오엔터프라이즈), 최윤재(카이스트), 주재걸(카이스트)

Association for the Advancement of Artificial Intelligence (AAAI)



In image classification, debiasing aims to train a classifier to be less susceptible to dataset bias, the strong correlation between peripheral attributes of data samples and a target class. For example, even if the frog class in the dataset mainly consists of frog images with a swamp background (i.e., biasaligned samples), a debiased classifier should be able to correctly classify a frog at a beach (i.e., bias-conflicting samples). Recent debiasing approaches commonly use two components for debiasing, a biased model fB and a debiased model fD. fB is trained to focus on bias-aligned samples (i.e., overfitted to the bias) while fD is mainly trained with bias-conflicting samples by concentrating on samples which fB fails to learn, leading fD to be less susceptible to the dataset bias. While the state-of-the-art debiasing techniques have aimed to better train fD, we focus on training fB, an overlooked component until now. Our empirical analysis reveals that removing the bias-conflicting samples from the training set for fB is important for improving the debiasing performance of fD. This is due to the fact that the biasconflicting samples work as noisy samples for amplifying the bias for fB since those samples do not include the bias attribute. To this end, we propose a simple yet effective data sample selection method which removes the bias-conflicting samples to construct a bias-amplified dataset for training fB. Our data sample selection method can be directly applied to existing reweighting-based debiasing approaches, obtaining consistent performance boost and achieving the state-of-theart performance on both synthetic and real-world datasets.