Getting Started¶

Get up and running with malaysian-manglish-nlp in under 5 minutes.

Installation¶

pip (recommended)With extrasFrom sourceDocker

pip install malaysian-manglish-nlp

Core install has zero external dependencies. Lightweight, fast, works everywhere.

# ML model backend (transformers)
pip install malaysian-manglish-nlp[ml]

# spaCy integration
pip install malaysian-manglish-nlp[spacy]

# FastAPI REST server
pip install malaysian-manglish-nlp[api]

# Everything
pip install malaysian-manglish-nlp[all]

git clone https://github.com/ZafranYusof/malaysia-manglish-nlp.git
cd malaysian-manglish-nlp
pip install -e .

Useful for development or if you want to modify modules directly.

docker pull zafranyusof/malaysian-manglish-nlp:latest
docker run -it malaysian-manglish-nlp mnlp --help

Pre-built image with all extras included.

Python Version

malaysian-manglish-nlp requires Python 3.9+. Verify with:

python --version

Verify Installation¶

mnlp --version
# malaysian-manglish-nlp 1.0.0

mnlp doctor
# ✓ Python 3.11.4
# ✓ Core modules loaded (51/51)
# ✓ Optional: ml not installed (install with [ml])
# ✓ Optional: spacy not installed (install with [spacy])

First Example Walkthrough¶

Let's analyse a real Manglish sentence step by step.

import malaysian_manglish_nlp as mnlp

text = "Sedap gila nasi lemak kat kedai tu! Confirm repeat lagi."

Step 1: Sentiment Analysis¶

result = mnlp.sentiment(text)
print(result)
# {'label': 'positive', 'score': 0.96, 'aspects': ['nasi lemak']}

The model correctly identifies this as positive - "sedap gila" (insanely delicious) and "confirm repeat" are strong positive signals in Manglish.

Step 2: Normalise Spelling¶

text2 = "xpe la bro, aku nk g mkn jap lg"
normalized = mnlp.normalize(text2)
print(normalized)
# "takpe la bro, aku nak pergi makan jap lagi"

Handles common Manglish abbreviations: xpe → takpe, nk → nak, g → pergi, mkn → makan.

Step 3: Detect Language Mix¶

lang = mnlp.language("Eh jom la we go makan, I lapar gila already")
print(lang)
# {'primary': 'manglish', 'mix': {'ms': 0.45, 'en': 0.55}}

Code-switching detection shows the BM/English blend ratio.

Step 4: Named Entity Recognition¶

entities = mnlp.ner("Siti beli iPhone kat Low Yat Plaza semalam")
print(entities)
# [('Siti', 'PERSON'), ('iPhone', 'PRODUCT'), ('Low Yat Plaza', 'LOCATION')]

NER works on Malaysian entities - local places, brands, and names.

Step 5: Chain with Pipeline¶

from malaysian_manglish_nlp import Pipeline

pipe = Pipeline([
    'normalize',    # Clean up informal spelling
    'tokenize',     # Split into tokens
    'sentiment'     # Analyse sentiment
])

results = pipe("xpe la, mmg best gila tempat ni")
print(results)
# {
#   'normalized': 'takpe la, memang best gila tempat ni',
#   'tokens': ['takpe', 'la', ',', 'memang', 'best', 'gila', 'tempat', 'ni'],
#   'sentiment': {'label': 'positive', 'score': 0.89}
# }

Pipelines pass output from one module to the next automatically.

CLI Usage Guide¶

The mnlp CLI lets you process text without writing Python.

Basic Commands¶

# Sentiment analysis
$ mnlp sentiment "Best gila movie tu!"
positive (0.92)

# Normalise text
$ mnlp normalize "aku xfhm ape ko ckp"
"aku tak faham apa kau cakap"

# Named Entity Recognition
$ mnlp ner "Ali kerja kat Grab Malaysia"
Ali (PERSON), Grab Malaysia (ORG)

# Language detection
$ mnlp language "Let's go makan at the mamak"
primary: manglish | ms: 0.40 | en: 0.60

# Tokenize
$ mnlp tokenize "Aku pergi kedai beli roti canai"
Aku | pergi | kedai | beli | roti | canai

All Subcommands¶

Command	Description
`mnlp sentiment`	Sentiment analysis (positive/negative/neutral)
`mnlp normalize`	Normalise informal Manglish spelling
`mnlp ner`	Named Entity Recognition
`mnlp language`	Language and code-switching detection
`mnlp tokenize`	Tokenisation
`mnlp summarize`	Text summarisation
`mnlp translate`	Manglish ↔ standard Malay/English
`mnlp stem`	Stemming and lemmatisation
`mnlp topics`	Topic extraction
`mnlp keywords`	Keyword extraction
`mnlp --help`	Full list with descriptions

File & Batch Processing¶

# Process a file line-by-line
$ mnlp sentiment --input tweets.txt --output results.json

# Batch process an entire directory
$ mnlp batch sentiment ./data/tweets/ --output ./results/ --format json

# Specify output columns
$ mnlp sentiment --input reviews.csv --output scored.csv --column text --append

Piping¶

Chain commands via stdin/stdout:

# Normalise then analyse
$ echo "xpe la bro, best gila" | mnlp normalize | mnlp sentiment
"takpe la bro, best gila" → positive (0.89)

# Process file through pipeline
$ cat data/raw.txt | mnlp normalize | mnlp ner --json

Common Recipes¶

import malaysian_manglish_nlp as mnlp

tweets = [
    "Weh sedap gila burger ni, confirm datang lagi!",
    "Aduh mahalnya, baik aku masak sendiri",
    "Ok la, not bad for the price",
]

for tweet in tweets:
    result = mnlp.sentiment(tweet)
    print(f"{result['label']:>10} ({result['score']:.2f})  -  {tweet}")
#   positive (0.95)  -  Weh sedap gila burger ni, confirm datang lagi!
#   negative (0.78)  -  Aduh mahalnya, baik aku masak sendiri
#    neutral (0.61)  -  Ok la, not bad for the price

Recipe: Normalise + NER Pipeline¶

from malaysian_manglish_nlp import Pipeline

pipe = Pipeline(['normalize', 'ner'])
result = pipe("ali keje kat mcd ss15 subang")
print(result)
# {
#   'normalized': 'ali kerja kat mcd ss15 subang',
#   'entities': [('ali', 'PERSON'), ('mcd', 'ORG'), ('ss15', 'LOCATION'), ('subang', 'LOCATION')]
# }

Recipe: Batch Process with Progress¶

import malaysian_manglish_nlp as mnlp
from pathlib import Path

files = Path("data/tweets").glob("*.txt")
results = []

for f in files:
    text = f.read_text(encoding="utf-8").strip()
    score = mnlp.sentiment(text)
    results.append({"file": f.name, "text": text, **score})

# Save results
import json
Path("results.json").write_text(json.dumps(results, indent=2, ensure_ascii=False))

Recipe: REST API Server¶

# Start the API server (requires [api] extra)
$ mnlp serve --port 8000

# Query from another terminal
$ curl http://localhost:8000/sentiment -d '{"text": "best gila la"}'
{"label": "positive", "score": 0.93}

Configuration¶

malaysian-manglish-nlp works out of the box with sensible defaults. For fine-tuning:

import malaysian_manglish_nlp as mnlp

# Set default model variant
mnlp.configure(backend="fast")      # fastest, lower accuracy
mnlp.configure(backend="balanced")  # default
mnlp.configure(backend="accurate")  # highest accuracy, slower

# Enable/disable modules
mnlp.configure(cache=True)          # cache repeated queries
mnlp.configure(batch_size=256)      # batch processing size

Or via environment variables:

export MANGLISH_NLP_BACKEND=balanced
export MANGLISH_NLP_CACHE=true
export MANGLISH_NLP_BATCH_SIZE=256

Next Steps¶

Browse Modules

Explore all 51 modules grouped by category - text processing, analysis, extraction, generation, and more.

Module Overview
API Reference

Full function signatures, parameters, return types, and examples for every public function.

API Reference
Benchmarks

Performance numbers on standard hardware. See how malaysian-manglish-nlp compares to alternatives.

Benchmarks
Contribute

Found a bug? Want a new module? Learn how to contribute to malaysian-manglish-nlp.

Contributing Guide

Need help?

Open an issue on GitHub or start a discussion.