Running on Windows¶
malaysian-manglish-nlp works on Windows, but a few things need extra attention.
Common Issues¶
UnicodeEncodeError¶
The most common Windows issue. Happens when printing Malay text with special characters (é, ñ, etc.) to a console that doesn't support UTF-8.
Fix: Set environment variable before running your script:
Or add to your script's top:
Gensim build issues¶
Gensim requires a C compiler for its fast Cython extensions. On Windows this means Visual Studio Build Tools.
Fix:
- Install Visual Studio Build Tools 2022
- Select "Desktop development with C++"
- Then install:
If you don't want to install build tools, Gensim ships pre-built wheels for most Python versions. Make sure pip is up to date:
torch / transformers slow on first import¶
The first time you import transformers or torch on Windows, it can take 10–30 seconds. This is normal - PyTorch initialises CUDA and loads DLLs.
Fix: Nothing to fix. Subsequent imports are fast within the same process.
File path issues¶
Windows uses \ as path separator. Python handles this fine, but if you're writing config files or shell scripts:
# Good - works everywhere
from pathlib import Path
data_dir = Path(__file__).parent / "data"
# Also fine
import os
data_dir = os.path.join(os.path.dirname(__file__), "data")
Recommended Setup¶
1. Use a virtual environment¶
2. PowerShell execution policy¶
If activation fails with a policy error:
3. Install extras as needed¶
# Core only (no ML models)
pip install malaysian-manglish-nlp
# With sentiment classifier
pip install malaysian-manglish-nlp[transformers]
# With word embeddings
pip install malaysian-manglish-nlp[embeddings]
# Everything
pip install malaysian-manglish-nlp[all]
4. Test the install¶
Windows-specific Notes¶
| Feature | Status on Windows | Notes |
|---|---|---|
| Core modules (51) | ✅ Full | No platform dependencies |
| Rule-based sentiment | ✅ Full | Pure Python |
| Fine-tuned classifier | ✅ Full | Needs [transformers] extra |
| Word embeddings | ✅ Full | Needs [embeddings] extra |
| FastAPI server | ✅ Full | Needs [api] extra |
| Graph visualisation | ⚠️ Partial | pyvis needs a browser to render |
| Parallel training | ⚠️ Slower | Windows lacks fork(); uses spawn |
Getting Help¶
If you hit a Windows-specific bug:
- Check the GitHub Issues
- Include your Python version, Windows version, and full traceback
- Run
python -m malaysian_manglish_nlp.debugto collect system info