Skip to main content
Version: 9.2.11

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased​

9.2.10 - 2026-02-07​

Changed​

  • Remove VERSION files and use importlib.metadata for dynamic versioning (#950, #951)
  • Use Rust TextClassifier with .bin models for classification (#935)
  • Update sentiment models to use underthesea_core TextClassifier (#946)
  • Consolidate classification into single module (#935)

Added​

  • Add pure Rust FastText inference to underthesea_core (#947)
  • Add TextPreprocessor to underthesea_core for Vietnamese text preprocessing (#942)
  • Add underthesea_core API documentation (#948)
  • Add workflow to publish underthesea_core to crates.io (#943)

Documentation​

  • Separate sidebars for Technical Reports, API Reference, Datasets, and Changelog (#945)
  • Add blog posts for Rust-powered text classification and CRF (#934, #935)
  • Rename docusaurus folder to docs (#933)

Security​

  • Bump jsonpath, django, @isaacs/brace-expansion dependencies (#938, #939, #940, #944)

9.2.0 - 2026-01-31​

Added​

  • Add Agent class with custom tools support using OpenAI function calling (GH-712)
  • Add default tools: calculator, datetime, web_search, wikipedia, shell, python, file operations (GH-712)

Changed​

  • Upgrade underthesea_core to 2.0.0 with L-BFGS optimizer (#899)
    • 10x faster feature lookup with flat data structure
    • 1.24x faster than python-crfsuite for word segmentation
    • L-BFGS with OWL-QN for L1 regularization

9.1.5 - 2026-01-29​

Added​

  • Add Agent API with OpenAI and Azure OpenAI support (GH-745, #890)
  • Add ParserTrainer for dependency parsing (GH-392, #880)
  • Add POS tagger training pipeline (GH-423, #883)

Documentation​

  • Add Vietnamese News Dataset (UVN) documentation (GH-885, #888, #889)
  • Add UVB dataset documentation (GH-720, #887)
  • Add UUD-v0.1 dataset documentation (#886)
  • Add UTS Dictionary dataset documentation (GH-622, #884)

9.1.4 - 2026-01-24​

Added​

  • Implement Logistic Regression library in Rust (#878)
  • Implement CRF library in Rust (#876)

Changed​

  • Remove NLTK dependency (#879)

Security​

  • Fix Dependabot security vulnerabilities (#874, #875)

9.1.3 - 2026-01-24​

Added​

  • Add dependency tree visualization (#867)

Changed​

  • Support PyTorch v2 for dependency parsing (#871)
  • Update CP_Vietnamese-VLC README with HuggingFace dataset (#872)

Fixed​

  • Fix ValueError when loading DependencyParser from non-existent path (#873)
  • Fix KeyError in Sentence.getattr (#870)
  • Fix TTS UnicodeDecodeError on Windows (#869)
  • Fix underthesea[voice] installation (#868)

9.1.2 - 2026-01-24​

Added​

  • Add labels property to classify and sentiment functions (#865)

Fixed​

  • Fix sklearn >= 1.5 compatibility for loaded models (#866)

9.1.1 - 2026-01-24​

Fixed​

  • Fix VERSION file to match pyproject.toml

9.1.0 - 2026-01-24​

Added​

  • Vietnamese-English translation module with translate() function (#856)
  • English to Vietnamese translation example in README (#858)

Changed​

  • Support Python 3.14, deprecate Python 3.9 (#862)
  • Migrate from Flake8/Pylint to Ruff for linting (#857)

Fixed​

  • Fix missing sdist (tar.gz) on PyPI for underthesea_core (#859)

8.3.0 - 2025-09-28​

Added​

  • Train text classification model for dataset VNTC2017_BANK (#819)
  • Add datasets UTS2017_Bank (#822)
  • Add bank model (#824)
  • Build wheels for macOS x86-64 (#820)

Removed​

  • Remove flake8 as runtime dependency (#818)

8.2.0 - 2025-09-21​

Changed​

  • Update project structure, create extensions/lab folder (#812)
  • Create Sonar Core 1 - System Card (#813)
  • Update output format of model sonar_core_1 (#815)

8.1.0 - 2025-09-21​

Fixed​

  • Fix missing .pkl files (#809)

8.0.1 - 2025-09-21​

Fixed​

  • Fix missing .txt files (#806)

Changed​

  • Update publish distribution to PyPI workflow (#805)

Security​

  • Security updates for dependencies

8.0.0 - 2025-09-20​

Added​

  • Underthesea Languages v2 (#748)
  • Interactive Page for Most Frequently Used Vietnamese Words (#756)
  • Support Python 3.12, 3.13 (#777)

Changed​

  • Update PyO3 API usage (#768)
  • Update project structure (#790)

Fixed​

  • Fix wrong global var in sent_tokenize (#764)
  • Fix logo in Readme.rst (#761)

6.8.4 - 2024-06-22​

Added​

  • Add lang_detect module (#733)

Changed​

  • Optimize imports (#741)
  • Remove issue-manager workflow (#726)

6.8.0 - 2023-09-23​

Added​

  • Release Source Distribution for underthesea_core (#708)
  • Create docker image for underthesea (#711)

Changed​

  • Code refactoring (#713)

Fixed​

  • Fix permission errors on removing downloaded models (#715)

6.7.0 - 2023-07-28​

Added​

  • Zero shot classification with OpenAI API (#700)

6.6.0 - 2023-07-27​

Fixed​

  • Fix bug word_tokenize (#697)

6.5.0 - 2023-07-14​

Fixed​

  • Fix text_normalizer token rules

6.4.0 - 2023-07-14​

Fixed​

  • Fix fixed_words regex

6.3.0 - 2023-06-28​

Added​

  • Support MacOS ARM

6.2.0 - 2023-03-04​

Added​

  • Add Text to Speech API (#668)
  • Provide training script for word segmentation, pos tagging, and NER (#666)
  • Create UTS_Dictionary v1.0 datasets (#663)

6.1.4 - 2023-02-26​

Added​

  • Support underthesea_core with Python 3.11 (#659)

6.1.2 - 2023-02-15​

Added​

  • Add option fixed_words to tokenize and word_tokenize API (#649)

6.0.0 - 2023-01-01​

Changed​

  • Version bump for 2023

1.4.1 - 2022-12-17​

Added​

  • Create underthesea app
  • Add viet2ipa module
  • Training NER model with VLSP2016 dataset using BERT

Removed​

  • Remove unidecode as a dependency

1.3.5 - 2022-10-31​

Added​

  • Add Text Normalization module
  • Release underthesea_core version 0.0.5a2
  • Support GLIBC_2.17

Changed​

  • Update resources path

Fixed​

  • Fix function word_tokenize

1.3.4 - 2022-01-08​

Added​

  • Demo chatbot with rasa
  • Lite version of underthesea
  • Add build for Windows

Changed​

  • Increase word_tokenize speed 1.5 times

1.3.3 - 2021-09-02​

Changed​

  • Update torch and transformer dependency

1.3.2 - 2021-08-04​

Added​

  • Publish two ABSA open datasets
  • Add pipeline folder

Changed​

  • Migrate from travis-ci to github actions
  • Update ParserTrainer

1.3.1 - 2021-01-11​

Added​

  • Add ClassifierTrainer
  • Add 3 new datasets

Changed​

  • Compatible with newer version of scikit-learn
  • Retrain classification and sentiment models

1.3.0 - 2020-12-11​

Added​

  • Dependency Parsing

Removed​

  • Remove languageflow dependency
  • Remove tabulate dependency

1.0.0 - 2017-03-01​

Added​

  • First release on PyPI
  • First release on ReadTheDocs