Skip to main content
Version: 9.2.11

underthesea_core

High-performance Rust extension for underthesea, providing ML models and text processing tools with Python bindings via PyO3.

Installation​

pip install underthesea-core

What's Included​

ModuleClassesDescription
CRFCRFTrainer, CRFModel, CRFTagger, CRFFeaturizerConditional Random Fields for sequence labeling
Logistic RegressionLRTrainer, LRModel, LRClassifierLogistic regression for text classification
Text ClassifierTextClassifier, LinearSVC, Label, SentenceEnd-to-end TF-IDF + SVM text classification pipeline
TF-IDFTfIdfVectorizerTF-IDF vectorization with n-gram support
Text PreprocessorTextPreprocessorVietnamese text preprocessing pipeline

Quick Start​

from underthesea_core import CRFTrainer, CRFTagger, CRFModel

# Train a CRF model
trainer = CRFTrainer(loss_function="lbfgs", max_iterations=100)
model = trainer.train(X_train, y_train)
model.save("model.bin")

# Load and predict
tagger = CRFTagger()
tagger.load("model.bin")
labels = tagger.tag(features)

Performance​

  • Built with Rust for optimal processing speed
  • L-BFGS optimizer with OWL-QN for L1 regularization
  • 10x faster feature lookup with flat data structure
  • 1.24x faster than python-crfsuite for word segmentation