Papers
arxiv:2407.10995

LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content

Published on Jun 24, 2024
Authors:
,

Abstract

LionGuard, a Singapore-specific moderation classifier for large language models, outperforms general moderation APIs on Singlish data, emphasizing the importance of localization in language safety.

As large language models (LLMs) become increasingly prevalent in a wide variety of applications, concerns about the safety of their outputs have become more significant. Most efforts at safety-tuning or moderation today take on a predominantly Western-centric view of safety, especially for toxic, hateful, or violent speech. In this paper, we describe LionGuard, a Singapore-contextualized moderation classifier that can serve as guardrails against unsafe LLM outputs. When assessed on Singlish data, LionGuard outperforms existing widely-used moderation APIs, which are not finetuned for the Singapore context, by 14% (binary) and up to 51% (multi-label). Our work highlights the benefits of localization for moderation classifiers and presents a practical and scalable approach for low-resource languages.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2407.10995
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.10995 in a dataset README.md to link it from this page.

Spaces citing this paper 9

Browse 9 spaces citing this paper

Collections including this paper 2