Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased​
9.2.10 - 2026-02-07​
Changed​
- Remove VERSION files and use
importlib.metadatafor dynamic versioning (#950, #951) - Use Rust TextClassifier with
.binmodels for classification (#935) - Update sentiment models to use underthesea_core TextClassifier (#946)
- Consolidate classification into single module (#935)
Added​
- Add pure Rust FastText inference to underthesea_core (#947)
- Add TextPreprocessor to underthesea_core for Vietnamese text preprocessing (#942)
- Add underthesea_core API documentation (#948)
- Add workflow to publish underthesea_core to crates.io (#943)
Documentation​
- Separate sidebars for Technical Reports, API Reference, Datasets, and Changelog (#945)
- Add blog posts for Rust-powered text classification and CRF (#934, #935)
- Rename docusaurus folder to docs (#933)
Security​
9.2.0 - 2026-01-31​
Added​
- Add Agent class with custom tools support using OpenAI function calling (GH-712)
- Add default tools: calculator, datetime, web_search, wikipedia, shell, python, file operations (GH-712)
Changed​
- Upgrade underthesea_core to 2.0.0 with L-BFGS optimizer (#899)
- 10x faster feature lookup with flat data structure
- 1.24x faster than python-crfsuite for word segmentation
- L-BFGS with OWL-QN for L1 regularization
9.1.5 - 2026-01-29​
Added​
- Add Agent API with OpenAI and Azure OpenAI support (GH-745, #890)
- Add ParserTrainer for dependency parsing (GH-392, #880)
- Add POS tagger training pipeline (GH-423, #883)
Documentation​
- Add Vietnamese News Dataset (UVN) documentation (GH-885, #888, #889)
- Add UVB dataset documentation (GH-720, #887)
- Add UUD-v0.1 dataset documentation (#886)
- Add UTS Dictionary dataset documentation (GH-622, #884)
9.1.4 - 2026-01-24​
Added​
Changed​
- Remove NLTK dependency (#879)
Security​
9.1.3 - 2026-01-24​
Added​
- Add dependency tree visualization (#867)
Changed​
- Support PyTorch v2 for dependency parsing (#871)
- Update CP_Vietnamese-VLC README with HuggingFace dataset (#872)
Fixed​
- Fix ValueError when loading DependencyParser from non-existent path (#873)
- Fix KeyError in Sentence.getattr (#870)
- Fix TTS UnicodeDecodeError on Windows (#869)
- Fix underthesea[voice] installation (#868)
9.1.2 - 2026-01-24​
Added​
- Add
labelsproperty toclassifyandsentimentfunctions (#865)
Fixed​
- Fix sklearn >= 1.5 compatibility for loaded models (#866)
9.1.1 - 2026-01-24​
Fixed​
- Fix VERSION file to match pyproject.toml
9.1.0 - 2026-01-24​
Added​
- Vietnamese-English translation module with
translate()function (#856) - English to Vietnamese translation example in README (#858)
Changed​
- Support Python 3.14, deprecate Python 3.9 (#862)
- Migrate from Flake8/Pylint to Ruff for linting (#857)
Fixed​
- Fix missing sdist (tar.gz) on PyPI for underthesea_core (#859)
8.3.0 - 2025-09-28​
Added​
- Train text classification model for dataset VNTC2017_BANK (#819)
- Add datasets UTS2017_Bank (#822)
- Add bank model (#824)
- Build wheels for macOS x86-64 (#820)
Removed​
- Remove flake8 as runtime dependency (#818)
8.2.0 - 2025-09-21​
Changed​
- Update project structure, create extensions/lab folder (#812)
- Create Sonar Core 1 - System Card (#813)
- Update output format of model sonar_core_1 (#815)
8.1.0 - 2025-09-21​
Fixed​
- Fix missing .pkl files (#809)
8.0.1 - 2025-09-21​
Fixed​
- Fix missing .txt files (#806)
Changed​
- Update publish distribution to PyPI workflow (#805)
Security​
- Security updates for dependencies
8.0.0 - 2025-09-20​
Added​
- Underthesea Languages v2 (#748)
- Interactive Page for Most Frequently Used Vietnamese Words (#756)
- Support Python 3.12, 3.13 (#777)
Changed​
Fixed​
6.8.4 - 2024-06-22​
Added​
- Add lang_detect module (#733)
Changed​
6.8.0 - 2023-09-23​
Added​
Changed​
- Code refactoring (#713)
Fixed​
- Fix permission errors on removing downloaded models (#715)
6.7.0 - 2023-07-28​
Added​
- Zero shot classification with OpenAI API (#700)
6.6.0 - 2023-07-27​
Fixed​
- Fix bug word_tokenize (#697)
6.5.0 - 2023-07-14​
Fixed​
- Fix text_normalizer token rules
6.4.0 - 2023-07-14​
Fixed​
- Fix fixed_words regex
6.3.0 - 2023-06-28​
Added​
- Support MacOS ARM
6.2.0 - 2023-03-04​
Added​
- Add Text to Speech API (#668)
- Provide training script for word segmentation, pos tagging, and NER (#666)
- Create UTS_Dictionary v1.0 datasets (#663)
6.1.4 - 2023-02-26​
Added​
- Support underthesea_core with Python 3.11 (#659)
6.1.2 - 2023-02-15​
Added​
- Add option fixed_words to tokenize and word_tokenize API (#649)
6.0.0 - 2023-01-01​
Changed​
- Version bump for 2023
1.4.1 - 2022-12-17​
Added​
- Create underthesea app
- Add viet2ipa module
- Training NER model with VLSP2016 dataset using BERT
Removed​
- Remove unidecode as a dependency
1.3.5 - 2022-10-31​
Added​
- Add Text Normalization module
- Release underthesea_core version 0.0.5a2
- Support GLIBC_2.17
Changed​
- Update resources path
Fixed​
- Fix function word_tokenize
1.3.4 - 2022-01-08​
Added​
- Demo chatbot with rasa
- Lite version of underthesea
- Add build for Windows
Changed​
- Increase word_tokenize speed 1.5 times
1.3.3 - 2021-09-02​
Changed​
- Update torch and transformer dependency
1.3.2 - 2021-08-04​
Added​
- Publish two ABSA open datasets
- Add pipeline folder
Changed​
- Migrate from travis-ci to github actions
- Update ParserTrainer
1.3.1 - 2021-01-11​
Added​
- Add ClassifierTrainer
- Add 3 new datasets
Changed​
- Compatible with newer version of scikit-learn
- Retrain classification and sentiment models
1.3.0 - 2020-12-11​
Added​
- Dependency Parsing
Removed​
- Remove languageflow dependency
- Remove tabulate dependency
1.0.0 - 2017-03-01​
Added​
- First release on PyPI
- First release on ReadTheDocs