Pipeline Usage¶
Chain multiple NLP modules into a single call - custom workflows, batch processing, and serializable pipelines.
Why pipelines?¶
Running 5 separate modules on the same text means 5 function calls, 5 result dicts to merge, and repeated boilerplate. Pipelines let you define a sequence of modules once and run them in a single call. Results are automatically merged into one structured output.
Load module¶
import malaysian_manglish_nlp as mnlp
# Default pipeline (normalize + sentiment + language + emotion)
result = mnlp.pipeline("gila best mkn dia")
print(result)
# {
# 'original': 'gila best mkn dia',
# 'normalized': 'gila best makan dia',
# 'sentiment': {'label': 'positive', 'score': 0.92},
# 'language': {'primary': 'manglish', ...},
# 'emotion': {'primary': 'joy', 'score': 0.85},
# }
Basic usage¶
Default pipeline¶
The default pipeline runs 4 modules: normalize, sentiment, language, emotion.
result = mnlp.pipeline("Weh best gila kedai tu!")
print(result['original'])
# Weh best gila kedai tu!
print(result['normalized'])
# Weh best gila kedai tu!
print(result['sentiment']['label'])
# positive
print(result['emotion']['primary'])
# joy
Custom steps¶
Choose which modules to run:
result = mnlp.pipeline(
"Ahmad kerja kat Petronas KL",
steps=['normalize', 'ner', 'pos']
)
print(result['ner'])
# [('Ahmad', 'PERSON'), ('Petronas', 'ORG'), ('KL', 'LOCATION')]
print(result['pos'])
# [('Ahmad', 'PROPN'), ('kerja', 'VERB'), ...]
Available steps¶
| Step | Module | What it does |
|---|---|---|
normalize |
mnlp.normalize() |
Expand shortforms |
sentiment |
mnlp.sentiment() |
Sentiment analysis |
language |
mnlp.detect_language() |
Language detection |
emotion |
mnlp.detect_emotion() |
Emotion detection |
profanity |
mnlp.detect_profanity() |
Profanity filter |
dialect |
mnlp.detect_dialect() |
Dialect detection |
sarcasm |
mnlp.detect_sarcasm() |
Sarcasm detection |
tokenize |
mnlp.tokenize() |
Tokenization |
stem |
mnlp.stem() |
Malay stemming |
pos |
mnlp.pos_tag() |
POS tagging |
ner |
mnlp.ner_tag() |
Named entities |
segment |
mnlp.segment() |
Text segmentation |
formalize |
mnlp.formalize() |
Formal BM |
clean |
mnlp.clean() |
Noise removal |
keywords |
mnlp.extract_keywords() |
Keyword extraction |
correct |
mnlp.correct() |
Spelling correction |
dependency |
mnlp.parse_dependencies() |
Dependency parsing |
all |
everything | Run all available steps |
Run all modules¶
result = mnlp.pipeline("Weh best gila kedai tu!", steps=['all'])
# All module results in one dict:
# result['original']
# result['normalized']
# result['sentiment']
# result['language']
# result['emotion']
# result['profanity']
# result['dialect']
# result['sarcasm']
# result['pos']
# result['ner']
# result['keywords']
# ... and more
Performance with 'all'
Running all modules takes ~5ms per text. For high-throughput scenarios, only include the modules you need.
Normalize first¶
By default, text is normalized before other modules run. This improves accuracy for sentiment, NER, and language detection on informal text.
# Normalize first (default)
result = mnlp.pipeline("xpe la best gila mkn dia", normalize_first=True)
# normalized text used for sentiment, language, emotion
# Skip normalization
result = mnlp.pipeline("xpe la best gila mkn dia", normalize_first=False)
# raw text used for all modules
The analyze function¶
mnlp.analyze() is a shortcut for the full pipeline:
result = mnlp.analyze("Weh best gila kedai tu, tapi service slow sikit")
# Returns comprehensive analysis:
# - normalized text
# - sentiment (with aspect detection for mixed)
# - language detection
# - POS tags
# - named entities
# - emotion
# - keywords
Batch pipeline¶
Process multiple texts through the same pipeline:
texts = [
"Best gila movie tu!",
"Teruk la service dia",
"Ahmad kerja kat Petronas KL",
"Ambo nok make nasi kerabu",
]
results = mnlp.batch_pipeline(
texts,
steps=['normalize', 'sentiment', 'ner']
)
for text, result in zip(texts, results):
sent = result['sentiment']['label']
entities = result.get('ner', [])
print(f"{text[:30]:30s} → {sent:8s} | {entities}")
# Best gila movie tu! → positive | []
# Teruk la service dia → negative | []
# Ahmad kerja kat Petronas KL → neutral | [('Ahmad', 'PERSON'), ...]
# Ambo nok make nasi kerabu → neutral | []
Custom pipeline workflow¶
Build a custom analysis function:
def analyze_review(text):
"""Custom review analysis pipeline."""
result = mnlp.pipeline(text, steps=['normalize', 'sentiment', 'emotion', 'keywords'])
return {
'text': result['normalized'],
'sentiment': result['sentiment']['label'],
'confidence': result['sentiment']['score'],
'emotion': result['emotion']['primary'],
'keywords': result['keywords'][:5],
'is_positive': result['sentiment']['label'] == 'positive',
}
# Use on a batch of reviews
reviews = [
"Best gila makanan dia, sedap!",
"Mahal sangat, tak berbaloi",
"Service ok, food average je",
]
for review in reviews:
analysis = analyze_review(review)
print(f"{analysis['sentiment']:8s} | {analysis['emotion']:12s} | {analysis['text'][:30]}")
Content moderation pipeline¶
def moderate(text):
"""Content moderation pipeline."""
result = mnlp.pipeline(text, steps=['normalize', 'profanity', 'sarcasm', 'sentiment'])
flags = []
if result.get('profanity', {}).get('is_profanity'):
flags.append('profanity')
if result.get('sarcasm', {}).get('is_sarcastic'):
flags.append('sarcasm')
if result.get('sentiment', {}).get('label') == 'negative':
flags.append('negative')
return {
'approved': len(flags) == 0,
'flags': flags,
'clean_text': result['normalized'],
}
CLI usage¶
# Full analysis (default pipeline)
$ mnlp analyze "Weh best gila kedai tu"
# With JSON output
$ mnlp analyze "Sedap nasi lemak" --json
# Pipe input
$ echo "Best gila!" | mnlp analyze
# Specific modules
$ mnlp sentiment "Best gila!" && mnlp ner "Ahmad kat KL"
# Benchmark
$ mnlp benchmark
Performance¶
| Pipeline | Latency | Throughput |
|---|---|---|
| Default (4 modules) | ~2ms | 15,000 texts/sec |
| Minimal (2 modules) | ~1ms | 30,000 texts/sec |
| All modules | ~5ms | 6,000 texts/sec |
| Batch (100 texts, default) | ~150ms | 12,000 texts/sec |
Optimization
- Only include modules you need
- Use batch_pipeline for multiple texts
- Normalize first improves accuracy but adds ~0.2ms
See also¶
- REST API - serve pipelines over HTTP
- Normalization - understand the normalize step
- Sentiment Analysis - understand sentiment output
- API Reference - full function signature