Integrations¶

Connect malaysian-manglish-nlp to spaCy, FastAPI, CLI, and LangChain - deploy anywhere.

Overview¶

Integration modules let you use malaysian-manglish-nlp inside existing frameworks: as spaCy pipeline components, behind a REST API with auto-generated docs, from the command line, or as LangChain tools for AI agents.

pip install malaysian-manglish-nlp[all]  # install all integration extras

Quick Start¶

PythonCLI

import malaysian_manglish_nlp as mnlp

# spaCy pipeline
nlp = mnlp.spacy_integration.load()
doc = nlp("Ahmad beli nasi lemak kat Pavilion KL semalam")
[(ent.text, ent.label_) for ent in doc.ents]
# [('Ahmad', 'PERSON'), ('Pavilion KL', 'LOCATION')]

# LangChain tool
from malaysian_manglish_nlp.langchain_tool import SentimentTool
tool = SentimentTool()
tool.run("Best gila makanan sini!")
# "positive (0.94)"

# Direct module access
mnlp sentiment "Best gila makanan sini!"
# {"label": "positive", "score": 0.94}

# Start REST API
mnlp serve --port 8000

Module Details¶

`spacy_integration`¶

Use malaysian-manglish-nlp modules as native spaCy pipeline components. Access entities, POS tags, and sentiment through the standard spaCy Doc API.

Requires: pip install malaysian-manglish-nlp[spacy]

import spacy
import malaysian_manglish_nlp as mnlp

# Load full pipeline
nlp = mnlp.spacy_integration.load()

doc = nlp("Ahmad beli nasi lemak kat Pavilion KL semalam")

# Tokens + POS
for token in doc:
    print(f"{token.text:12} {token.pos_:6} {token.dep_}")
# Ahmad        PROPN  nsubj
# beli         VERB   ROOT
# nasi         NOUN   obj
# lemak        ADJ    amod
# kat          ADP    case
# Pavilion     PROPN  compound
# KL           PROPN  obl
# semalam      NOUN   obl:tmod

# Entities
for ent in doc.ents:
    print(f"{ent.text:16} {ent.label_}")
# Ahmad            PERSON
# Pavilion KL      LOCATION

# Sentiment (custom attribute)
doc._.sentiment
# {'label': 'neutral', 'score': 0.65}

Available Components¶

Component	Pipe Name	Description
Tokenizer	`mnlp_tokenizer`	Malaysian-aware tokenisation
NER	`mnlp_ner`	Named entity recognition
POS	`mnlp_pos`	Part-of-speech tagging
Sentiment	`mnlp_sentiment`	Sentiment as `doc._.sentiment`
Language	`mnlp_language`	Language detection as `doc._.language`

Custom Pipeline Assembly¶

nlp = spacy.blank("ms")
nlp.add_pipe("mnlp_tokenizer")
nlp.add_pipe("mnlp_ner")
nlp.add_pipe("mnlp_sentiment")

doc = nlp("Weh best gila movie tu")
doc._.sentiment
# {'label': 'positive', 'score': 0.93}

Mixing Components

Combine malaysian-manglish-nlp components with standard spaCy components. For example, use mnlp_tokenizer + mnlp_ner with spaCy's built-in sentencizer for sentence boundary detection.

REST API¶

Deploy malaysian-manglish-nlp as a FastAPI server with automatic Swagger documentation, CORS, rate limiting, and batch support.

Starting the Server¶

CLIPython

# Start with defaults (port 8000, all modules)
mnlp serve

# Specific modules + port
mnlp serve --modules sentiment,ner,normalize --port 9000

# Production with workers
uvicorn malaysian_manglish_nlp.api:app --workers 4 --port 8000

from malaysian_manglish_nlp.api import create_app

app = create_app(
    modules=["sentiment", "ner", "normalize", "translate"],
    cors=True,
    rate_limit="100/minute",
    auth="api-key",
    cache=True,
    batch_max=100
)
# Run: uvicorn app:app --port 8000

Endpoint Reference¶

Method	Endpoint	Description
`POST`	`/sentiment`	Sentiment analysis
`POST`	`/ner`	Named entity recognition
`POST`	`/normalize`	Text normalisation
`POST`	`/clean`	Text cleaning
`POST`	`/translate`	Translation
`POST`	`/embeddings`	Sentence embeddings
`POST`	`/pipeline`	Custom pipeline execution
`GET`	`/health`	Health check
`GET`	`/docs`	Swagger UI (auto-generated)

Request Examples¶

Single TextBatchNER

curl -X POST http://localhost:8000/sentiment \
  -H "Content-Type: application/json" \
  -d '{"text": "Best gila makanan sini!"}'

# Response:
# {"label": "positive", "score": 0.94}

curl -X POST http://localhost:8000/sentiment \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Best!", "Teruk la", "Ok je"]}'

# Response:
# [{"label": "positive", "score": 0.92},
#  {"label": "negative", "score": 0.87},
#  {"label": "neutral", "score": 0.78}]

curl -X POST http://localhost:8000/ner \
  -H "Content-Type: application/json" \
  -d '{"text": "Siti beli iPhone kat Low Yat Plaza"}'

# Response:
# [{"text": "Siti", "label": "PERSON"},
#  {"text": "iPhone", "label": "PRODUCT"},
#  {"text": "Low Yat Plaza", "label": "LOCATION"}]

Server Configuration¶

Parameter	Type	Default	Description
`modules`	`list[str]`	all	Modules to expose
`cors`	`bool`	`True`	Enable CORS headers
`rate_limit`	`str`	`None`	Rate limit (e.g. `"100/minute"`)
`auth`	`str`	`None`	Auth mode: `"api-key"`, `"bearer"`
`cache`	`bool`	`False`	Enable response caching
`batch_max`	`int`	`50`	Maximum batch size

Production Deployment

Use gunicorn with uvicorn workers for production:

gunicorn malaysian_manglish_nlp.api:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

CLI¶

Command-line interface for every module. Process files, pipe from stdin, chain operations, and manage configuration - all without writing Python.

Core Commands¶

mnlp sentiment "Best gila makanan sini!"
# {"label": "positive", "score": 0.94}

mnlp ner "Siti beli iPhone kat Low Yat Plaza"
# [{"text": "Siti", "label": "PERSON"}, ...]

mnlp normalize "xpe la bro, aku nk g mkn"
# "takpe la bro, aku nak pergi makan"

mnlp clean "Weh @ahmad check ni https://t.co/abc 🔥🔥🔥"
# "Weh check ni"

mnlp translate "Aku nak pergi makan" --target en
# "I want to go eat"

mnlp tokenize "Tak boleh la macam tu"
# ["Tak", "boleh", "la", "macam", "tu"]

mnlp pos "Aku nak pergi makan kat kedai tu"
# [["Aku", "PRON"], ["nak", "AUX"], ...]

mnlp keywords "Kerajaan umum pakej rangsangan RM50 bilion"
# ["pakej rangsangan", "RM50 bilion", "kerajaan"]

File Processing¶

# Process file → JSON output
mnlp sentiment --input data.txt --output results.json

# Process directory
mnlp batch sentiment ./input/ --output ./output/ --format jsonl

# Stream from stdin
cat tweets.txt | mnlp sentiment --format csv

Pipeline via CLI¶

# Chain operations
mnlp pipe "clean | normalize | sentiment" --input data.txt

# From config file
mnlp pipe --config pipeline.json --input data.txt

Output Formats¶

mnlp sentiment "text" --format json    # default
mnlp sentiment "text" --format csv
mnlp sentiment "text" --format table

Configuration¶

mnlp config set model accurate
mnlp config set output_format json
mnlp config set cache true
mnlp config show

Verbose Mode

Add -v to any command for debug output including timing and model info:

mnlp sentiment "text" -v
# [model: default | time: 2.3ms]
# {"label": "positive", "score": 0.94}

`langchain`¶

Use malaysian-manglish-nlp modules as LangChain tools for AI agent workflows. Each module wraps as a standard LangChain BaseTool with description and schema.

Requires: pip install malaysian-manglish-nlp[langchain]

from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
from malaysian_manglish_nlp.langchain_tool import (
    SentimentTool,
    NerTool,
    TranslateTool,
    NormalizeTool
)

llm = ChatOpenAI(model="gpt-4", temperature=0)

tools = [
    SentimentTool(),
    NerTool(),
    TranslateTool(),
    NormalizeTool()
]

agent = initialize_agent(
    tools, llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Agent can now use malaysian-manglish-nlp tools
agent.run("Analyse the sentiment of 'Best gila makanan sini!' and translate it to English")
# > Using tool: sentiment_analysis
# > Input: "Best gila makanan sini!"
# > Result: positive (0.94)
# > Using tool: translate
# > Input: "Best gila makanan sini!" → en
# > Result: "The food here is incredibly good!"
# >
# > The text has a positive sentiment (94% confidence) and translates to
# > "The food here is incredibly good!" in English.

Available Tools¶

Tool Class	Module	Description
`SentimentTool`	`sentiment`	Analyse text sentiment
`NerTool`	`ner`	Extract named entities
`TranslateTool`	`translate`	Translate between languages
`NormalizeTool`	`normalize`	Normalise informal text
`EmotionTool`	`emotion`	Detect emotions
`KeywordsTool`	`keywords`	Extract keywords
`SummarizeTool`	`summarize`	Summarise text
`QATool`	`qa`	Answer questions from context

Custom Tool Configuration¶

from malaysian_manglish_nlp.langchain_tool import SentimentTool

# With custom settings
tool = SentimentTool(
    name="malaysian_sentiment",
    description="Analyse sentiment of Malaysian/Manglish text",
    detailed=True  # return all class scores
)

Building a Moderation Agent

from malaysian_manglish_nlp.langchain_tool import SentimentTool, HateSpeechTool, ProfanityTool

moderation_tools = [SentimentTool(), HateSpeechTool(), ProfanityTool()]
# Agent can triage content using all three signals

Tool Descriptions

Each tool includes a detailed description that LangChain agents use for routing. The descriptions specify that input should be Malaysian or Manglish text, helping the agent choose the right tool for multilingual inputs.