Sentiment Analysis¶

Detect positive, negative, and neutral sentiment in Malaysian text - including Manglish, code-switching, and sarcasm.

Why sentiment analysis?¶

Social media monitoring, brand reputation tracking, customer feedback analysis, and public opinion mining for Malaysian businesses and researchers. Standard NLP tools fail on Manglish because they don't understand slang like "gila best", "teruk gila", or sarcastic patterns like "Bagus la tu, tunggu 3 jam".

malaysian-manglish-nlp handles all of this natively - no preprocessing needed.

Load module¶

import malaysian_manglish_nlp as mnlp

# Basic sentiment
result = mnlp.sentiment("Weh best gila makanan kat sini!")
print(result)
# {'label': 'positive', 'score': 0.94}

That's it. One import, one function call.

Basic usage¶

Positive sentiment¶

mnlp.sentiment("Sedap gila nasi lemak kat kedai tu!")
# {'label': 'positive', 'score': 0.96}

mnlp.sentiment("Best movie tu, memang worth it la")
# {'label': 'positive', 'score': 0.91}

mnlp.sentiment("Alhamdulillah finally dapat kerja baru")
# {'label': 'positive', 'score': 0.89}

Negative sentiment¶

mnlp.sentiment("Teruk la service dia, tunggu 1 jam")
# {'label': 'negative', 'score': 0.89}

mnlp.sentiment("Mahal gila, tak berbaloi langsung")
# {'label': 'negative', 'score': 0.93}

mnlp.sentiment("Kecewa betul, order wrong again")
# {'label': 'negative', 'score': 0.87}

Neutral sentiment¶

mnlp.sentiment("Aku nak pergi kedai jap")
# {'label': 'neutral', 'score': 0.72}

mnlp.sentiment("Meeting pukul 3 petang ni")
# {'label': 'neutral', 'score': 0.68}

Detailed output¶

Get scores for all classes instead of just the top prediction:

mnlp.sentiment("Best gila!", detailed=True)
# {'label': 'positive',
#  'scores': {'positive': 0.96, 'neutral': 0.03, 'negative': 0.01}}

mnlp.sentiment("Ok je, nothing special", detailed=True)
# {'label': 'neutral',
#  'scores': {'positive': 0.12, 'neutral': 0.76, 'negative': 0.12}}

When to use detailed mode

Use detailed=True when you need confidence calibration or want to apply custom thresholds. The raw scores help you decide how much to trust the prediction.

Aspect-based sentiment¶

Extract sentiment for individual aspects within a sentence. Perfect for product reviews where different features have different sentiments.

mnlp.sentiment("Makanan sedap tapi service slow", aspect=True)
# [{'aspect': 'makanan', 'label': 'positive', 'score': 0.92},
#  {'aspect': 'service', 'label': 'negative', 'score': 0.85}]

mnlp.sentiment("Phone cantik gila tapi battery lemah", aspect=True)
# [{'aspect': 'phone', 'label': 'positive', 'score': 0.90},
#  {'aspect': 'battery', 'label': 'negative', 'score': 0.82}]

mnlp.sentiment("Harga murah, rasa sedap, tapi parking susah", aspect=True)
# [{'aspect': 'harga', 'label': 'positive', 'score': 0.88},
#  {'aspect': 'rasa', 'label': 'positive', 'score': 0.91},
#  {'aspect': 'parking', 'label': 'negative', 'score': 0.79}]

Aspect extraction limits

Aspect-based mode works best with clear noun-adjective patterns. Very complex sentences or implicit aspects may not be captured.

Sarcasm detection¶

malaysian-manglish-nlp detects sarcastic sentiment automatically - no separate flag needed. Sarcastic text with positive words but negative context gets correctly classified.

# Sarcastic  -  "bagus" + negative context
mnlp.sentiment("Bagus la tu, tunggu 3 jam baru sampai")
# {'label': 'negative', 'score': 0.78}  # Detected as sarcasm

# Sarcastic  -  double praise pattern
mnlp.sentiment("Wah pandainya, exam fail pun boleh celebrate")
# {'label': 'negative', 'score': 0.74}  # Detected as sarcasm

# Genuine positive (for comparison)
mnlp.sentiment("Pandai betul dia solve masalah tu")
# {'label': 'positive', 'score': 0.88}  # Genuine

For standalone sarcasm detection:

mnlp.detect_sarcasm("Bagus la tu, memang efficient gila, tunggu sampai esok")
# {'is_sarcastic': True, 'score': 0.82, 'cues': ['positive_opener', 'negative_context']}

Batch processing¶

Pass a list for efficient batch inference:

texts = [
    "Best gila movie tu!",
    "Teruk la service dia",
    "Ok je, nothing much",
    "Sedap nasi lemak Mak Cik",
    "Mahal sangat, tak worth it",
]

results = mnlp.sentiment(texts)
for text, result in zip(texts, results):
    print(f"{text[:30]:30s} → {result['label']} ({result['score']:.2f})")

# Best gila movie tu!            → positive (0.93)
# Teruk la service dia           → negative (0.89)
# Ok je, nothing much            → neutral (0.65)
# Sedap nasi lemak Mak Cik       → positive (0.94)
# Mahal sangat, tak worth it     → negative (0.91)

Performance

Batch mode processes texts sequentially with zero overhead. For true parallel processing, use the Pipeline module or the REST API.

Mixed sentiment handling¶

Text with contrast markers ("tapi", "but", "cuma") gets analyzed for mixed sentiment:

mnlp.sentiment("Makanan best tapi harga mahal", detailed=True)
# {'label': 'mixed',
#  'scores': {'positive': 0.45, 'neutral': 0.10, 'negative': 0.45},
#  'contrast': True}

malaysian-manglish-nlp handles noisy text directly. No need to clean first:

# With hashtags
mnlp.sentiment("Best gila!! #nasilemak #sedap #klfood")
# {'label': 'positive', 'score': 0.93}

# With emojis
mnlp.sentiment("Sedap sangat 😍😍😍 confirm repeat")
# {'label': 'positive', 'score': 0.95}

# With elongated words
mnlp.sentiment("Besttttt gilaaaaa!!!")
# {'label': 'positive', 'score': 0.91}

# With mentions
mnlp.sentiment("@restaurant Best food but slow service la")
# {'label': 'mixed', ...}

Preprocessing option

For maximum accuracy on very noisy text, normalize first:

clean = mnlp.normalize("xpe la best gila mkn dia")
result = mnlp.sentiment(clean)

CLI usage¶

# Basic sentiment
$ mnlp sentiment "Best gila movie tu!"
positive (0.92)

# With JSON output
$ mnlp sentiment "Teruk la service" --json
{"label": "negative", "score": 0.89}

# Pipe from stdin
$ echo "Sedap nasi lemak" | mnlp sentiment
positive (0.94)

# Chain with normalize
$ echo "xpe la best gila" | mnlp normalize | mnlp sentiment
"takpe la best gila" → positive (0.89)

# Full analysis
$ mnlp analyze "Weh best gila kedai tu"

How it works¶

malaysian-manglish-nlp's sentiment module uses a layered approach:

Lexicon matching - 2,000+ Malaysian sentiment words including slang ("gila best", "syok", "hampeh")
Intensifier handling - "gila", "sangat", "betul", "super" modify scores
Negation detection - "tak", "tidak", "bukan", "don't" flip polarity
Sarcasm patterns - positive opener + negative context = sarcasm flag
Contrast markers - "tapi", "but", "cuma" trigger mixed sentiment analysis

No external models needed. The rule-based engine runs at 23,000+ texts/sec.

Performance¶

Metric	Score
Accuracy (Malay reviews)	89.2%
Accuracy (Manglish tweets)	85.7%
Sarcasm detection F1	76.3%
Aspect extraction F1	81.5%
Throughput	23,000 texts/sec
Latency (single)	< 0.5ms

Benchmarked on 5,000 annotated Malaysian social media texts. See Benchmarks for full details.

Advanced: ML backend¶

For higher accuracy on complex sentences, use the transformer model:

pip install malaysian-manglish-nlp[ml]

result = mnlp.sentiment("Service ok la tapi makanan agak tawar sikit", model="ml")
# Higher accuracy on nuanced/long sentences

ML model trade-offs

The ML model is more accurate but 50-100x slower. Use rule-based for high-throughput pipelines and ML for batch analysis where accuracy matters most.