Hate Speech Detection¶

Detect hate speech across 6 target categories with severity levels and leetspeak evasion handling.

Why hate speech detection?¶

Content moderation for social media platforms, forum management, brand safety, and research on online toxicity. Malaysian hate speech is particularly challenging because it uses code-switching, local slurs, religious references, and leetspeak to evade basic filters.

Load module¶

import malaysian_manglish_nlp as mnlp

result = mnlp.detect_hate_speech("Text to check here")
print(result)
# {'is_hate': False, 'severity': 'none', 'categories': [], 'score': 0.12}

Basic usage¶

Non-hate speech¶

mnlp.detect_hate_speech("Sedap gila nasi lemak kat sini!")
# {'is_hate': False, 'severity': 'none', 'categories': [], 'score': 0.05}

mnlp.detect_hate_speech("Aku tak setuju dengan polisi kerajaan")
# {'is_hate': False, 'severity': 'none', 'categories': [], 'score': 0.15}

Criticism is not hate speech

The module distinguishes between legitimate criticism/opinion and actual hate speech. Political disagreement, product complaints, and negative reviews are not flagged.

Detected hate speech¶

result = mnlp.detect_hate_speech("Offensive text here")
# {'is_hate': True, 'severity': 'high', 'categories': ['race'], 'score': 0.92}

6 target categories¶

Category	Description
`race`	Racist content targeting ethnicity
`religion`	Content attacking religious groups
`gender`	Sexist or misogynistic content
`nationality`	Xenophobic content targeting nationality
`disability`	Content mocking disabilities
`sexual_orientation`	Homophobic or transphobic content

Check severity¶

mnlp.get_severity("text here")
# 'none' | 'low' | 'medium' | 'high'

Check target groups¶

mnlp.get_target_groups("text here")
# [] or ['race', 'religion', ...]

Severity levels¶

Level	Description	Action
`none`	Clean text	No action
`low`	Mildly offensive, borderline	Flag for review
`medium`	Clearly offensive	Moderate
`high`	Severe hate speech	Remove immediately

# Quick binary check
mnlp.is_hate("some text")
# True or False

Leetspeak and evasion detection¶

Malaysians sometimes use leetspeak or spelling tricks to evade filters. malaysian-manglish-nlp detects common patterns:

# Number substitution
# "b0d0h" instead of "bodoh"
# Racial slurs with @, $, 0 substitutions

# Deliberate misspelling
# Split words, extra characters

# Code-switched slurs
# Mixed BM/EN offensive phrases

The module normalizes text internally before detection:

# These evasion attempts are caught:
# - Character substitution (0→o, @→a, $→s, 1→i)
# - Deliberate spacing ("b o d o h")
# - Repeated characters ("booodohh")
# - Abbreviated slurs

Batch processing¶

texts = [
    "Sedap gila makanan ni!",
    "Normal sentence here",
    "Another clean text",
    "Offensive content here",
]

results = mnlp.hate_detect_batch(texts)
for text, result in zip(texts, results):
    status = "🚫 HATE" if result['is_hate'] else "✅ Clean"
    print(f"{status} | {text[:40]}")

Integration with content moderation¶

def moderate_comment(text):
    """Example moderation pipeline."""
    # Check hate speech
    hate = mnlp.detect_hate_speech(text)

    if hate['severity'] == 'high':
        return {'action': 'block', 'reason': 'hate_speech'}

    if hate['severity'] == 'medium':
        return {'action': 'flag', 'reason': 'review_needed'}

    # Check profanity too
    profanity = mnlp.detect_profanity(text)
    if profanity['severity'] == 'high':
        return {'action': 'censor', 'censored': mnlp.censor(text)}

    return {'action': 'approve'}

# Usage
moderate_comment("Normal friendly comment")
# {'action': 'approve'}

CLI usage¶

# Full analysis (includes hate speech check)
$ mnlp analyze "text to check"

# With JSON
$ mnlp analyze "text" --json

How it works¶

Normalization - leetspeak decoding, repeated char reduction
Lexicon matching - curated hate speech dictionaries for Malaysian context
Pattern detection - known hate speech phrases and structures
Context analysis - distinguish insults from hate speech
Severity classification - score determines none/low/medium/high
Category assignment - which target group(s) are attacked

Performance¶

Metric	Score
Binary F1	86.3%
Severity accuracy	79.8%
Category F1 (macro)	74.5%
Leetspeak detection	82.1%
False positive rate	3.2%
Throughput	18,000 texts/sec

False positives

3.2% false positive rate means some legitimate text gets flagged. Always provide an appeal mechanism in production moderation systems.