APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets

양기창(카카오, 카카오엔터프라이즈, 숭실대), 장원준(카카오, 숭실대), 조원익(서울대)




Detecting toxic or pejorative expressions in online communities has become one of the main concerns for preventing the users’ men- tal harm. This led to the development of large- scale hate speech detection datasets of var- ious domains, which are mainly built upon web-crawled texts with labels by crowdwork- ers. However, for languages other than English, researchers might have to rely on only a small- sized corpus due to the lack of data-driven re- search of hate speech detection. This some- times misleads the evaluation of prevalently used pretrained language models (PLMs) such as BERT, given that PLMs often share the do- main of pretraining corpus with the evaluation set, resulting in over-representation of the de- tection performance. Also, the scope of pejo- rative expressions might be restricted if the dataset is built on a single domain text.

To alleviate the above problems in Korean hate speech detection, we propose APEACH, a method that allows the collection of hate speech generated by unspecified users. By con- trolling the crowd-generation of hate speech and adding only a minimum post-labeling, we create a corpus that enables the general- izable and fair evaluation of hate speech de- tection regarding text domain and topic. We compare our outcome with prior work on an annotation-based toxic news comment dataset using publicly available PLMs. We check that our dataset is less sensitive to the lexical over- lap between the evaluation set and pretraining corpus of PLMs, showing that it helps mitigate the unexpected under/over-representation of model performance. We distribute our dataset publicly online to further facilitate the general- domain hate speech detection in Korean.