Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[3.3.0] - 2026-06-01

Added

  • Aspect-Based Sentiment module: per-aspect sentiment with 4 domains (restaurant, product, app, general), dynamic aspect extraction, conflict detection
  • Multi-Label Emotion module: detect multiple emotions simultaneously with confidence scores, 10 co-occurrence patterns (bittersweet, anxious, etc.)
  • Feedback Loop system: user correction storage, active learning uncertainty sampling, error pattern detection, JSONL training data export
  • WebSocket Streaming API: real-time analysis via ws://host:8000/ws/analyze with per-module streaming, ping/pong keepalive, rate limiting
  • Async Batch API: /batch/async with job tracking, /batch/status/{id} progress, cancellation support, max 100 texts
  • New REST endpoints: /aspect-sentiment, /multi-emotion, /feedback, /feedback/stats, /active-learning/uncertain
  • Docker image updated (Python 3.12 slim, feedback volume)
  • Chrome extension packaged for Web Store publish

Changed

  • Model retrained on 28,263 examples (from 14,384), 34,548 total merged
  • Sentiment accuracy: 98.0% (from 95.0%, +3.0%)
  • Emotion detection: 96.5% (from 90.3%, +6.2%)
  • Intent classification: 99.3% (from 97.5%, +1.8%)
  • Average accuracy: 97.9% (from 94.3%, +3.6%)
  • REST API expanded from ~300 to ~1050 lines
  • Batch endpoint max increased from 50 to 100 texts
  • Pydantic v2 compatibility (ConfigDict migration)

Fixed

  • Multi-task training KeyError with partial-label datasets (filtered 4,801 samples)
  • teruk removed from intensifier list (primarily negative, not intensifier)
  • Contrast-marker-aware window scoring in aspect sentiment (prevents bleed across tapi/but)

[3.2.0] - 2026-05-31

Added

  • XLM-Roberta base model (replacing distilbert-multilingual)
  • Focal loss for class imbalance handling
  • Uncertainty-weighted multi-task loss (Kendall et al. 2018)
  • Cosine annealing with warm restarts
  • Mixed precision training (FP16)
  • Gradient accumulation (effective batch size 32)
  • Early stopping with patience
  • Learning rate finder (optional)
  • Ensemble module with confidence-based fallback (< 60% uses rule-based)
  • Task-specific attention embeddings
  • Augmented dataset: 14,384 examples (from 7,884)

Changed

  • Sentiment accuracy: 95.0% (from 88.5%)
  • Emotion detection: 90.3% (from 83.6%)
  • Intent classification: 97.5% (from 94.5%)
  • Average accuracy: 94.3% (from 88.9%)
  • Model size: 1.1GB (XLM-Roberta base)
  • Raw text training (preserves Manglish slang patterns)
  • Better handling of minority emotion classes (love, disgust, surprise)

Fixed

  • WeightedRandomSampler index mismatch with Subset datasets
  • Memory issues during training (reduced max_length to 96)
  • FutureWarning for deprecated torch.cuda.amp APIs

[3.1.0] - 2026-05-30

Added

  • Retrained multi-task model on 7,884 examples (up from 561)
  • Auto-download model from HuggingFace on first use
  • Jawi (Rumi↔Jawi) transliteration module
  • Parallel processing pipeline
  • Memory optimization with lazy module loading

Changed

  • Sentiment accuracy: 88.5% (from 69% with 561 examples)
  • Emotion detection: 83.6% (8 classes, 3 sentiment + 8 emotion + 6 intent multi-task)
  • Intent classification: 94.5%
  • Average validation accuracy: 88.9% (7,884 training examples, 1,577 validation)
  • Chrome extension and VS Code extension included

Fixed

  • Model path resolution for fine-tuned weights
  • Package name consistency across all configs and docs

[3.0.0] - 2026-05-29

Added

  • 51 total modules (14 new since v2.0.0)
  • Trained models for sentiment, emotion, sarcasm, and toxicity detection
  • Benchmark dashboard with automated performance tracking
  • CLI interface (manglish command)
  • Pipeline composition with lazy loading
  • Batch processing with progress reporting
  • Export module (CoNLL, JSON, CSV formats)
  • Coreference resolution module
  • Relation extraction module
  • Question answering module
  • Text generation module
  • Emoji sentiment mapping
  • Near-duplicate detection

Changed

  • Performance tuning: 23,000+ texts/sec throughput
  • Import time reduced to <0.5s for core
  • Real-world validation across 10,000+ Malaysian social media posts
  • Improved NER with Malaysian entity types
  • Better code-switching detection accuracy

Fixed

  • Stemmer handling of reduplicated words
  • Tokenizer edge cases with mixed script text
  • Sentiment model calibration for neutral class

[2.0.0] - 2026-04-15

Added

  • 37 total modules (11 new since v1.0.0)
  • 381-case benchmark suite with 100% pass rate
  • Pipeline mode for chaining operations
  • Code-switching detection module
  • Dependency parsing
  • Phrase chunking
  • Text augmentation (augment, backtranslate)
  • Spell checker with Malaysian dictionary
  • Collocation detection
  • Word frequency lists
  • Result caching layer

Changed

  • Rewritten tokenizer for better Manglish handling
  • Improved normalization coverage (2,000+ slang terms)
  • Faster stemmer implementation

[1.0.0] - 2026-03-01

Added

  • Initial release with 26 core modules
  • Text normalization for Manglish
  • Tokenization and sentence segmentation
  • Malay stemmer and lemmatizer
  • Sentiment analysis (rule-based + ML)
  • Named Entity Recognition
  • POS tagging
  • Language detection (BM/EN/Manglish)
  • Text similarity
  • Keyword extraction
  • Stopword lists
  • Basic CLI
  • Zero-dependency core design