Can AI be trusted to moderate hate speech?

к ленте

Лиминач @liminach / 20.09.2025 · 08:11

Can AI be trusted to moderate hate speech?

Sample: 1,3 million generated statements.

Researchers created 1,3 million synthetic hate speech examples using a template: a quantifier ("all" or "some"), a target group (e.g., сhristians, immigrants, women), and a derogatory statement. This produced sentences like "All white nationalists are criminals". Seven AI moderation systems then analyzed these statements.

Shown: the same hateful statement could be flagged as hate speech by one AI system and deemed acceptable by another. Disagreements were especially stark for statements targeting groups based on education, social status, or interests (e.g., "woke people," "christians"). AI moderation lacks a unified standard, leading to decisions that often seem arbitrary, unpredictable, and unfair. The definition of "hate speech" is notoriously vague, so models can't rely on a consistent rulebook during training.

Like if you want to become a scientist and generate hate speech!

Link to an article.

↳ 0

разное

комментарии · 0

свежее

комментируют только авторизованные через telegram

минимальная защита от спам-ботов. анонимность сохраняется.

пост

Can AI be trusted to moderate hate speech?

комментарии · 0

комментируют только авторизованные через telegram