Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[3.3.0] - 2026-06-01¶
Added¶
- Aspect-Based Sentiment module: per-aspect sentiment with 4 domains (restaurant, product, app, general), dynamic aspect extraction, conflict detection
- Multi-Label Emotion module: detect multiple emotions simultaneously with confidence scores, 10 co-occurrence patterns (bittersweet, anxious, etc.)
- Feedback Loop system: user correction storage, active learning uncertainty sampling, error pattern detection, JSONL training data export
- WebSocket Streaming API: real-time analysis via
ws://host:8000/ws/analyzewith per-module streaming, ping/pong keepalive, rate limiting - Async Batch API:
/batch/asyncwith job tracking,/batch/status/{id}progress, cancellation support, max 100 texts - New REST endpoints:
/aspect-sentiment,/multi-emotion,/feedback,/feedback/stats,/active-learning/uncertain - Docker image updated (Python 3.12 slim, feedback volume)
- Chrome extension packaged for Web Store publish
Changed¶
- Model retrained on 28,263 examples (from 14,384), 34,548 total merged
- Sentiment accuracy: 98.0% (from 95.0%, +3.0%)
- Emotion detection: 96.5% (from 90.3%, +6.2%)
- Intent classification: 99.3% (from 97.5%, +1.8%)
- Average accuracy: 97.9% (from 94.3%, +3.6%)
- REST API expanded from ~300 to ~1050 lines
- Batch endpoint max increased from 50 to 100 texts
- Pydantic v2 compatibility (ConfigDict migration)
Fixed¶
- Multi-task training KeyError with partial-label datasets (filtered 4,801 samples)
terukremoved from intensifier list (primarily negative, not intensifier)- Contrast-marker-aware window scoring in aspect sentiment (prevents bleed across tapi/but)
[3.2.0] - 2026-05-31¶
Added¶
- XLM-Roberta base model (replacing distilbert-multilingual)
- Focal loss for class imbalance handling
- Uncertainty-weighted multi-task loss (Kendall et al. 2018)
- Cosine annealing with warm restarts
- Mixed precision training (FP16)
- Gradient accumulation (effective batch size 32)
- Early stopping with patience
- Learning rate finder (optional)
- Ensemble module with confidence-based fallback (< 60% uses rule-based)
- Task-specific attention embeddings
- Augmented dataset: 14,384 examples (from 7,884)
Changed¶
- Sentiment accuracy: 95.0% (from 88.5%)
- Emotion detection: 90.3% (from 83.6%)
- Intent classification: 97.5% (from 94.5%)
- Average accuracy: 94.3% (from 88.9%)
- Model size: 1.1GB (XLM-Roberta base)
- Raw text training (preserves Manglish slang patterns)
- Better handling of minority emotion classes (love, disgust, surprise)
Fixed¶
- WeightedRandomSampler index mismatch with Subset datasets
- Memory issues during training (reduced max_length to 96)
- FutureWarning for deprecated torch.cuda.amp APIs
[3.1.0] - 2026-05-30¶
Added¶
- Retrained multi-task model on 7,884 examples (up from 561)
- Auto-download model from HuggingFace on first use
- Jawi (Rumi↔Jawi) transliteration module
- Parallel processing pipeline
- Memory optimization with lazy module loading
Changed¶
- Sentiment accuracy: 88.5% (from 69% with 561 examples)
- Emotion detection: 83.6% (8 classes, 3 sentiment + 8 emotion + 6 intent multi-task)
- Intent classification: 94.5%
- Average validation accuracy: 88.9% (7,884 training examples, 1,577 validation)
- Chrome extension and VS Code extension included
Fixed¶
- Model path resolution for fine-tuned weights
- Package name consistency across all configs and docs
[3.0.0] - 2026-05-29¶
Added¶
- 51 total modules (14 new since v2.0.0)
- Trained models for sentiment, emotion, sarcasm, and toxicity detection
- Benchmark dashboard with automated performance tracking
- CLI interface (
manglishcommand) - Pipeline composition with lazy loading
- Batch processing with progress reporting
- Export module (CoNLL, JSON, CSV formats)
- Coreference resolution module
- Relation extraction module
- Question answering module
- Text generation module
- Emoji sentiment mapping
- Near-duplicate detection
Changed¶
- Performance tuning: 23,000+ texts/sec throughput
- Import time reduced to <0.5s for core
- Real-world validation across 10,000+ Malaysian social media posts
- Improved NER with Malaysian entity types
- Better code-switching detection accuracy
Fixed¶
- Stemmer handling of reduplicated words
- Tokenizer edge cases with mixed script text
- Sentiment model calibration for neutral class
[2.0.0] - 2026-04-15¶
Added¶
- 37 total modules (11 new since v1.0.0)
- 381-case benchmark suite with 100% pass rate
- Pipeline mode for chaining operations
- Code-switching detection module
- Dependency parsing
- Phrase chunking
- Text augmentation (augment, backtranslate)
- Spell checker with Malaysian dictionary
- Collocation detection
- Word frequency lists
- Result caching layer
Changed¶
- Rewritten tokenizer for better Manglish handling
- Improved normalization coverage (2,000+ slang terms)
- Faster stemmer implementation
[1.0.0] - 2026-03-01¶
Added¶
- Initial release with 26 core modules
- Text normalization for Manglish
- Tokenization and sentence segmentation
- Malay stemmer and lemmatizer
- Sentiment analysis (rule-based + ML)
- Named Entity Recognition
- POS tagging
- Language detection (BM/EN/Manglish)
- Text similarity
- Keyword extraction
- Stopword lists
- Basic CLI
- Zero-dependency core design