Integrations¶
Connect malaysian-manglish-nlp to spaCy, FastAPI, CLI, and LangChain - deploy anywhere.
Overview¶
Integration modules let you use malaysian-manglish-nlp inside existing frameworks: as spaCy pipeline components, behind a REST API with auto-generated docs, from the command line, or as LangChain tools for AI agents.
Quick Start¶
import malaysian_manglish_nlp as mnlp
# spaCy pipeline
nlp = mnlp.spacy_integration.load()
doc = nlp("Ahmad beli nasi lemak kat Pavilion KL semalam")
[(ent.text, ent.label_) for ent in doc.ents]
# [('Ahmad', 'PERSON'), ('Pavilion KL', 'LOCATION')]
# LangChain tool
from malaysian_manglish_nlp.langchain_tool import SentimentTool
tool = SentimentTool()
tool.run("Best gila makanan sini!")
# "positive (0.94)"
Module Details¶
spacy_integration¶
Use malaysian-manglish-nlp modules as native spaCy pipeline components. Access entities, POS tags, and sentiment through the standard spaCy Doc API.
Requires: pip install malaysian-manglish-nlp[spacy]
import spacy
import malaysian_manglish_nlp as mnlp
# Load full pipeline
nlp = mnlp.spacy_integration.load()
doc = nlp("Ahmad beli nasi lemak kat Pavilion KL semalam")
# Tokens + POS
for token in doc:
print(f"{token.text:12} {token.pos_:6} {token.dep_}")
# Ahmad PROPN nsubj
# beli VERB ROOT
# nasi NOUN obj
# lemak ADJ amod
# kat ADP case
# Pavilion PROPN compound
# KL PROPN obl
# semalam NOUN obl:tmod
# Entities
for ent in doc.ents:
print(f"{ent.text:16} {ent.label_}")
# Ahmad PERSON
# Pavilion KL LOCATION
# Sentiment (custom attribute)
doc._.sentiment
# {'label': 'neutral', 'score': 0.65}
Available Components¶
| Component | Pipe Name | Description |
|---|---|---|
| Tokenizer | mnlp_tokenizer |
Malaysian-aware tokenisation |
| NER | mnlp_ner |
Named entity recognition |
| POS | mnlp_pos |
Part-of-speech tagging |
| Sentiment | mnlp_sentiment |
Sentiment as doc._.sentiment |
| Language | mnlp_language |
Language detection as doc._.language |
Custom Pipeline Assembly¶
nlp = spacy.blank("ms")
nlp.add_pipe("mnlp_tokenizer")
nlp.add_pipe("mnlp_ner")
nlp.add_pipe("mnlp_sentiment")
doc = nlp("Weh best gila movie tu")
doc._.sentiment
# {'label': 'positive', 'score': 0.93}
Mixing Components
Combine malaysian-manglish-nlp components with standard spaCy components. For example, use mnlp_tokenizer + mnlp_ner with spaCy's built-in sentencizer for sentence boundary detection.
REST API¶
Deploy malaysian-manglish-nlp as a FastAPI server with automatic Swagger documentation, CORS, rate limiting, and batch support.
Starting the Server¶
Endpoint Reference¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/sentiment |
Sentiment analysis |
POST |
/ner |
Named entity recognition |
POST |
/normalize |
Text normalisation |
POST |
/clean |
Text cleaning |
POST |
/translate |
Translation |
POST |
/embeddings |
Sentence embeddings |
POST |
/pipeline |
Custom pipeline execution |
GET |
/health |
Health check |
GET |
/docs |
Swagger UI (auto-generated) |
Request Examples¶
Server Configuration¶
| Parameter | Type | Default | Description |
|---|---|---|---|
modules |
list[str] |
all | Modules to expose |
cors |
bool |
True |
Enable CORS headers |
rate_limit |
str |
None |
Rate limit (e.g. "100/minute") |
auth |
str |
None |
Auth mode: "api-key", "bearer" |
cache |
bool |
False |
Enable response caching |
batch_max |
int |
50 |
Maximum batch size |
Production Deployment
Use gunicorn with uvicorn workers for production:
CLI¶
Command-line interface for every module. Process files, pipe from stdin, chain operations, and manage configuration - all without writing Python.
Core Commands¶
mnlp sentiment "Best gila makanan sini!"
# {"label": "positive", "score": 0.94}
mnlp ner "Siti beli iPhone kat Low Yat Plaza"
# [{"text": "Siti", "label": "PERSON"}, ...]
mnlp normalize "xpe la bro, aku nk g mkn"
# "takpe la bro, aku nak pergi makan"
mnlp clean "Weh @ahmad check ni https://t.co/abc 🔥🔥🔥"
# "Weh check ni"
mnlp translate "Aku nak pergi makan" --target en
# "I want to go eat"
mnlp tokenize "Tak boleh la macam tu"
# ["Tak", "boleh", "la", "macam", "tu"]
mnlp pos "Aku nak pergi makan kat kedai tu"
# [["Aku", "PRON"], ["nak", "AUX"], ...]
mnlp keywords "Kerajaan umum pakej rangsangan RM50 bilion"
# ["pakej rangsangan", "RM50 bilion", "kerajaan"]
File Processing¶
# Process file → JSON output
mnlp sentiment --input data.txt --output results.json
# Process directory
mnlp batch sentiment ./input/ --output ./output/ --format jsonl
# Stream from stdin
cat tweets.txt | mnlp sentiment --format csv
Pipeline via CLI¶
# Chain operations
mnlp pipe "clean | normalize | sentiment" --input data.txt
# From config file
mnlp pipe --config pipeline.json --input data.txt
Output Formats¶
mnlp sentiment "text" --format json # default
mnlp sentiment "text" --format csv
mnlp sentiment "text" --format table
Configuration¶
mnlp config set model accurate
mnlp config set output_format json
mnlp config set cache true
mnlp config show
Verbose Mode
Add -v to any command for debug output including timing and model info:
langchain¶
Use malaysian-manglish-nlp modules as LangChain tools for AI agent workflows. Each module wraps as a standard LangChain BaseTool with description and schema.
Requires: pip install malaysian-manglish-nlp[langchain]
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
from malaysian_manglish_nlp.langchain_tool import (
SentimentTool,
NerTool,
TranslateTool,
NormalizeTool
)
llm = ChatOpenAI(model="gpt-4", temperature=0)
tools = [
SentimentTool(),
NerTool(),
TranslateTool(),
NormalizeTool()
]
agent = initialize_agent(
tools, llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
# Agent can now use malaysian-manglish-nlp tools
agent.run("Analyse the sentiment of 'Best gila makanan sini!' and translate it to English")
# > Using tool: sentiment_analysis
# > Input: "Best gila makanan sini!"
# > Result: positive (0.94)
# > Using tool: translate
# > Input: "Best gila makanan sini!" → en
# > Result: "The food here is incredibly good!"
# >
# > The text has a positive sentiment (94% confidence) and translates to
# > "The food here is incredibly good!" in English.
Available Tools¶
| Tool Class | Module | Description |
|---|---|---|
SentimentTool |
sentiment |
Analyse text sentiment |
NerTool |
ner |
Extract named entities |
TranslateTool |
translate |
Translate between languages |
NormalizeTool |
normalize |
Normalise informal text |
EmotionTool |
emotion |
Detect emotions |
KeywordsTool |
keywords |
Extract keywords |
SummarizeTool |
summarize |
Summarise text |
QATool |
qa |
Answer questions from context |
Custom Tool Configuration¶
from malaysian_manglish_nlp.langchain_tool import SentimentTool
# With custom settings
tool = SentimentTool(
name="malaysian_sentiment",
description="Analyse sentiment of Malaysian/Manglish text",
detailed=True # return all class scores
)
Building a Moderation Agent
Tool Descriptions
Each tool includes a detailed description that LangChain agents use for routing. The descriptions specify that input should be Malaysian or Manglish text, helping the agent choose the right tool for multilingual inputs.
See Also¶
- Tools - pipeline, caching, and profiling for production deployments
- REST API + Cache - cache API responses for repeated queries
- Benchmarks - API throughput and latency numbers