Translation¶
Translate between Bahasa Melayu, English, and Manglish - with entity preservation and formal output.
Why translation?¶
Cross-language content processing, bilingual chatbot support, document translation for Malaysian businesses, and normalizing Manglish for formal contexts. Google Translate struggles with Manglish because it's not a standard language - "xpe la bro" isn't in any training corpus.
malaysian-manglish-nlp uses a rule-based approach that handles the actual patterns Malaysians use.
Load module¶
import malaysian_manglish_nlp as mnlp
# BM to English
result = mnlp.to_english("Saya nak pergi makan nasi lemak")
print(result)
# "I want to go eat nasi lemak"
# English to BM
result = mnlp.to_malay("I want to eat fried rice")
print(result)
# "Saya mahu makan nasi goreng"
Basic usage¶
BM → English¶
mnlp.to_english("Aku nak pergi kedai jap")
# "I want to go to the shop for a moment"
mnlp.to_english("Diaorang semua dah sampai rumah")
# "They all have arrived at the house"
mnlp.to_english("Jangan lupa bawa payung, hujan nanti")
# "Don't forget to bring an umbrella, it will rain later"
English → BM¶
mnlp.to_malay("I am going to the market")
# "Saya sedang pergi ke pasar"
mnlp.to_malay("She already ate lunch")
# "Dia sudah makan tengah hari"
General translate function¶
# Auto-detect and translate
mnlp.translate("Saya suka makan nasi lemak")
# Auto-detects BM → translates to English
mnlp.translate("I like eating coconut rice")
# Auto-detects English → translates to BM
Formal translation¶
Convert informal Manglish to proper formal Bahasa Melayu:
mnlp.to_formal("Aku nk g mkn jap")
# "Saya hendak pergi makan sebentar"
mnlp.to_formal("xpe la bro, aku tunggu je kat sini")
# "Tidak mengapalah, saya tunggu sahaja di sini"
mnlp.to_formal("ko dah makan ke belum?")
# "Awak sudah makan atau belum?"
Formalize vs to_formal
mnlp.formalize() normalizes spelling and grammar.
mnlp.to_formal() translates the register to formal BM.
Use formalize when you just want clean text; use to_formal when you need proper formal register.
Word-level translation¶
Translate individual words:
mnlp.word_translate("makan")
# "eat"
mnlp.word_translate("pergi")
# "go"
mnlp.word_translate("beautiful")
# "cantik"
Real Manglish examples¶
The rule-based approach handles common Manglish patterns:
# Shortforms expanded during translation
mnlp.to_english("sy nk tny brp harga")
# "I want to ask how much is the price"
# Particles preserved
mnlp.to_english("Sedap lah makanan ni")
# "This food is delicious"
# Code-switched input
mnlp.to_english("I nak order satu teh tarik")
# "I want to order one teh tarik"
# Informal contractions
mnlp.to_english("Diaorg pg kedai mamak")
# "They went to the mamak shop"
Entity preservation¶
Proper nouns and entities are preserved across translation:
mnlp.to_english("Ahmad pergi Petronas Tower KL")
# "Ahmad went to Petronas Tower KL"
mnlp.to_english("Harga Myvi baru RM50,000")
# "The price of new Myvi is RM50,000"
Translation limitations
malaysian-manglish-nlp translation is rule-based, not neural. It excels at common patterns but may struggle with: - Very long or complex sentences - Domain-specific jargon - Idiomatic expressions not in the dictionary - Creative/slang wordplay
For production-grade translation quality, pair with a neural MT system and use malaysian-manglish-nlp for preprocessing.
Detect and translate¶
Automatically detect the input language and translate to the other:
mnlp.detect_and_translate("Saya lapar gila")
# "I am very hungry" (detected BM → EN)
mnlp.detect_and_translate("I am very hungry")
# "Saya sangat lapar" (detected EN → BM)
CLI usage¶
# BM to English
$ mnlp translate "Saya nak makan nasi lemak" --to en
I want to eat nasi lemak
# English to BM
$ mnlp translate "I want to eat" --to bm
Saya mahu makan
# To formal BM
$ mnlp translate "aku nk g mkn" --to formal
Saya hendak pergi makan
# Pipe input
$ echo "Best gila makanan tu" | mnlp translate --to en
That food is incredibly delicious
# Auto-detect
$ mnlp translate "Saya suka makan"
I like to eat
How it works¶
malaysian-manglish-nlp uses a rule-based translation pipeline:
- Normalization - expand shortforms (nk→nak, sy→saya)
- Tokenization - split into tokens with particle awareness
- Dictionary lookup - 15,000+ BM↔EN word pairs
- Phrase matching - multi-word expressions ("nasi lemak" stays intact)
- Grammar rules - word order transformation (BM SVO → EN SVO with modifier placement)
- Entity preservation - proper nouns pass through unchanged
This approach is fast (no model loading), deterministic (same input = same output), and handles Manglish patterns that neural models miss.
Performance¶
| Metric | Score |
|---|---|
| BLEU (BM→EN, news) | 42.3 |
| BLEU (BM→EN, social) | 35.8 |
| BLEU (EN→BM, news) | 38.7 |
| Formal accuracy | 87.1% |
| Throughput | 15,000 texts/sec |
| Latency (single) | < 1ms |
BLEU scores context
Rule-based BLEU scores are lower than neural MT on standard benchmarks. However, on Manglish text (which neural MT handles poorly), malaysian-manglish-nlp outperforms Google Translate by a wide margin.
See also¶
- Normalization - preprocess text before translation
- Language Detection - detect input language first
- Pipeline - chain translation with other modules
- REST API - serve translation over HTTP
- API Reference - full function signature