NLP

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets

양기창(카카오, 카카오엔터프라이즈, 숭실대), 장원준(카카오, 숭실대), 조원익(서울대)

Arxiv

2022-02-25

Abstract

Detecting toxic or pejorative expressions in online communities has become one of the main concerns for preventing the users’ men- tal harm. This led to the development of large- scale hate speech detection datasets of var- ious domains, which are mainly built upon web-crawled texts with labels by crowdwork- ers. However, for languages other than English, researchers might have to rely on only a small- sized corpus due to the lack of data-driven re- search of hate speech detection. This some- times misleads the evaluation of prevalently used pretrained language models (PLMs) such as BERT, given that PLMs often share the do- main of pretraining corpus with the evaluation set, resulting in over-representation of the de- tection performance. Also, the scope of pejo- rative expressions might be restricted if the dataset is built on a single domain text.

To alleviate the above problems in Korean hate speech detection, we propose APEACH, a method that allows the collection of hate speech generated by unspecified users. By con- trolling the crowd-generation of hate speech and adding only a minimum post-labeling, we create a corpus that enables the general- izable and fair evaluation of hate speech de- tection regarding text domain and topic. We compare our outcome with prior work on an annotation-based toxic news comment dataset using publicly available PLMs. We check that our dataset is less sensitive to the lexical over- lap between the evaluation set and pretraining corpus of PLMs, showing that it helps mitigate the unexpected under/over-representation of model performance. We distribute our dataset publicly online to further facilitate the general- domain hate speech detection in Korean.