Can AI be trusted to moderate hate speech?
Can AI be trusted to moderate hate speech?
Sample: 1,3 million generated statements.
Researchers created 1,3 million synthetic hate speech examples using a template: a quantifier ("all" or "some"), a target group (e.g., сhristians, immigrants, women), and a derogatory statement. This produced sentences like "All white nationalists are criminals". Seven AI moderation systems then analyzed these statements.
Shown: the same hateful statement could be flagged as hate speech by one AI system and deemed acceptable by another. Disagreements were especially stark for statements targeting groups based on education, social status, or interests (e.g., "woke people," "christians"). AI moderation lacks a unified standard, leading to decisions that often seem arbitrary, unpredictable, and unfair. The definition of "hate speech" is notoriously vague, so models can't rely on a consistent rulebook during training.
Like if you want to become a scientist and generate hate speech!
комментируют только авторизованные через telegram
минимальная защита от спам-ботов. анонимность сохраняется.
комментарии · 0
свежее