Getting Started¶
Get up and running with malaysian-manglish-nlp in under 5 minutes.
Installation¶
Verify Installation¶
mnlp --version
# malaysian-manglish-nlp 1.0.0
mnlp doctor
# ✓ Python 3.11.4
# ✓ Core modules loaded (51/51)
# ✓ Optional: ml not installed (install with [ml])
# ✓ Optional: spacy not installed (install with [spacy])
First Example Walkthrough¶
Let's analyse a real Manglish sentence step by step.
import malaysian_manglish_nlp as mnlp
text = "Sedap gila nasi lemak kat kedai tu! Confirm repeat lagi."
Step 1: Sentiment Analysis¶
result = mnlp.sentiment(text)
print(result)
# {'label': 'positive', 'score': 0.96, 'aspects': ['nasi lemak']}
The model correctly identifies this as positive - "sedap gila" (insanely delicious) and "confirm repeat" are strong positive signals in Manglish.
Step 2: Normalise Spelling¶
text2 = "xpe la bro, aku nk g mkn jap lg"
normalized = mnlp.normalize(text2)
print(normalized)
# "takpe la bro, aku nak pergi makan jap lagi"
Handles common Manglish abbreviations: xpe → takpe, nk → nak, g → pergi, mkn → makan.
Step 3: Detect Language Mix¶
lang = mnlp.language("Eh jom la we go makan, I lapar gila already")
print(lang)
# {'primary': 'manglish', 'mix': {'ms': 0.45, 'en': 0.55}}
Code-switching detection shows the BM/English blend ratio.
Step 4: Named Entity Recognition¶
entities = mnlp.ner("Siti beli iPhone kat Low Yat Plaza semalam")
print(entities)
# [('Siti', 'PERSON'), ('iPhone', 'PRODUCT'), ('Low Yat Plaza', 'LOCATION')]
NER works on Malaysian entities - local places, brands, and names.
Step 5: Chain with Pipeline¶
from malaysian_manglish_nlp import Pipeline
pipe = Pipeline([
'normalize', # Clean up informal spelling
'tokenize', # Split into tokens
'sentiment' # Analyse sentiment
])
results = pipe("xpe la, mmg best gila tempat ni")
print(results)
# {
# 'normalized': 'takpe la, memang best gila tempat ni',
# 'tokens': ['takpe', 'la', ',', 'memang', 'best', 'gila', 'tempat', 'ni'],
# 'sentiment': {'label': 'positive', 'score': 0.89}
# }
Pipelines pass output from one module to the next automatically.
CLI Usage Guide¶
The mnlp CLI lets you process text without writing Python.
Basic Commands¶
# Sentiment analysis
$ mnlp sentiment "Best gila movie tu!"
positive (0.92)
# Normalise text
$ mnlp normalize "aku xfhm ape ko ckp"
"aku tak faham apa kau cakap"
# Named Entity Recognition
$ mnlp ner "Ali kerja kat Grab Malaysia"
Ali (PERSON), Grab Malaysia (ORG)
# Language detection
$ mnlp language "Let's go makan at the mamak"
primary: manglish | ms: 0.40 | en: 0.60
# Tokenize
$ mnlp tokenize "Aku pergi kedai beli roti canai"
Aku | pergi | kedai | beli | roti | canai
All Subcommands¶
| Command | Description |
|---|---|
mnlp sentiment |
Sentiment analysis (positive/negative/neutral) |
mnlp normalize |
Normalise informal Manglish spelling |
mnlp ner |
Named Entity Recognition |
mnlp language |
Language and code-switching detection |
mnlp tokenize |
Tokenisation |
mnlp summarize |
Text summarisation |
mnlp translate |
Manglish ↔ standard Malay/English |
mnlp stem |
Stemming and lemmatisation |
mnlp topics |
Topic extraction |
mnlp keywords |
Keyword extraction |
mnlp --help |
Full list with descriptions |
File & Batch Processing¶
# Process a file line-by-line
$ mnlp sentiment --input tweets.txt --output results.json
# Batch process an entire directory
$ mnlp batch sentiment ./data/tweets/ --output ./results/ --format json
# Specify output columns
$ mnlp sentiment --input reviews.csv --output scored.csv --column text --append
Piping¶
Chain commands via stdin/stdout:
# Normalise then analyse
$ echo "xpe la bro, best gila" | mnlp normalize | mnlp sentiment
"takpe la bro, best gila" → positive (0.89)
# Process file through pipeline
$ cat data/raw.txt | mnlp normalize | mnlp ner --json
Common Recipes¶
Recipe: Social Media Sentiment Dashboard¶
import malaysian_manglish_nlp as mnlp
tweets = [
"Weh sedap gila burger ni, confirm datang lagi!",
"Aduh mahalnya, baik aku masak sendiri",
"Ok la, not bad for the price",
]
for tweet in tweets:
result = mnlp.sentiment(tweet)
print(f"{result['label']:>10} ({result['score']:.2f}) - {tweet}")
# positive (0.95) - Weh sedap gila burger ni, confirm datang lagi!
# negative (0.78) - Aduh mahalnya, baik aku masak sendiri
# neutral (0.61) - Ok la, not bad for the price
Recipe: Normalise + NER Pipeline¶
from malaysian_manglish_nlp import Pipeline
pipe = Pipeline(['normalize', 'ner'])
result = pipe("ali keje kat mcd ss15 subang")
print(result)
# {
# 'normalized': 'ali kerja kat mcd ss15 subang',
# 'entities': [('ali', 'PERSON'), ('mcd', 'ORG'), ('ss15', 'LOCATION'), ('subang', 'LOCATION')]
# }
Recipe: Batch Process with Progress¶
import malaysian_manglish_nlp as mnlp
from pathlib import Path
files = Path("data/tweets").glob("*.txt")
results = []
for f in files:
text = f.read_text(encoding="utf-8").strip()
score = mnlp.sentiment(text)
results.append({"file": f.name, "text": text, **score})
# Save results
import json
Path("results.json").write_text(json.dumps(results, indent=2, ensure_ascii=False))
Recipe: REST API Server¶
# Start the API server (requires [api] extra)
$ mnlp serve --port 8000
# Query from another terminal
$ curl http://localhost:8000/sentiment -d '{"text": "best gila la"}'
{"label": "positive", "score": 0.93}
Configuration¶
malaysian-manglish-nlp works out of the box with sensible defaults. For fine-tuning:
import malaysian_manglish_nlp as mnlp
# Set default model variant
mnlp.configure(backend="fast") # fastest, lower accuracy
mnlp.configure(backend="balanced") # default
mnlp.configure(backend="accurate") # highest accuracy, slower
# Enable/disable modules
mnlp.configure(cache=True) # cache repeated queries
mnlp.configure(batch_size=256) # batch processing size
Or via environment variables:
export MANGLISH_NLP_BACKEND=balanced
export MANGLISH_NLP_CACHE=true
export MANGLISH_NLP_BATCH_SIZE=256
Next Steps¶
-
Browse Modules
Explore all 51 modules grouped by category - text processing, analysis, extraction, generation, and more.
-
API Reference
Full function signatures, parameters, return types, and examples for every public function.
-
Benchmarks
Performance numbers on standard hardware. See how malaysian-manglish-nlp compares to alternatives.
-
Contribute
Found a bug? Want a new module? Learn how to contribute to malaysian-manglish-nlp.
Need help?
Open an issue on GitHub or start a discussion.