Skip to main content
Version: 9.2.11

Text Classifier

End-to-end text classification combining TF-IDF vectorization and Linear SVM, running entirely in Rust for maximum performance.

TextClassifier​

Unified TF-IDF + SVM pipeline that avoids Python-Rust boundary overhead for intermediate vectors.

Usage​

from underthesea_core import TextClassifier

texts = [
"sản phẩm rất tốt",
"hàng đẹp giá rẻ",
"hàng xấu quá",
"tệ lắm không mua nữa",
]
labels = ["positive", "positive", "negative", "negative"]

clf = TextClassifier(max_features=20000, ngram_range=(1, 2), c=1.0)
clf.fit(texts, labels)

# Single prediction
label = clf.predict("sản phẩm tốt")

# Prediction with confidence
label, score = clf.predict_with_score("sản phẩm tốt")

# Batch prediction
labels = clf.predict_batch(["sản phẩm tốt", "hàng xấu"])

# Save/load
clf.save("classifier.bin")
clf = TextClassifier.load("classifier.bin")

Constructor​

TextClassifier(
max_features=20000,
ngram_range=(1, 2),
min_df=1,
max_df=1.0,
c=1.0,
max_iter=1000,
tol=0.1,
preprocessor=None,
)

Parameters​

ParameterTypeDefaultDescription
max_featuresint20000Maximum vocabulary size
ngram_range(int, int)(1, 2)Min and max n-gram range
min_dfint1Minimum document frequency
max_dffloat1.0Maximum document frequency ratio
cfloat1.0SVM regularization parameter
max_iterint1000Maximum SVM training iterations
tolfloat0.1Convergence tolerance
preprocessorTextPreprocessorNoneOptional text preprocessor

Properties​

PropertyTypeDescription
is_fittedboolWhether the model has been trained
n_featuresintVocabulary size
classeslist[str]Class labels
preprocessorTextPreprocessorThe preprocessor (if set)

Methods​

MethodReturnsDescription
fit(texts, labels)NoneTrain on texts and labels
predict(text)strPredict label for a single text
predict_with_score(text)(str, float)Predict with confidence score
predict_batch(texts)list[str]Predict labels for multiple texts
predict_with_scores(texts)list[(str, float)]Batch predict with scores
predict_sentence(sentence)NonePredict and add labels to a Sentence object
save(path)NoneSave model to binary file
load(path)TextClassifierLoad model from binary file (static)

With Preprocessor​

from underthesea_core import TextClassifier, TextPreprocessor

pp = TextPreprocessor()
clf = TextClassifier(preprocessor=pp)
clf.fit(texts, labels)

# Teencode is auto-expanded before prediction
clf.predict("sp ko tốt") # "sp" → "sản phẩm", "ko" → "không"

LinearSVC​

LIBLINEAR-style linear SVM using Dual Coordinate Descent. Used internally by TextClassifier, but also available standalone.

Usage​

from underthesea_core import LinearSVC

svm = LinearSVC()
svm.fit(features, labels, c=1.0, max_iter=1000, tol=0.1)

label = svm.predict(features_single)
labels = svm.predict_batch(features_batch)

svm.save("svm.bin")
svm = LinearSVC.load("svm.bin")

Constructor​

LinearSVC()

Properties​

PropertyTypeDescription
classeslist[str]Class labels
n_featuresintNumber of features

Methods​

MethodReturnsDescription
fit(features, labels, c=1.0, max_iter=1000, tol=0.1)NoneTrain the SVM classifier
predict(features)strPredict label for a single instance
predict_batch(batch)list[str]Predict labels for a batch
save(path)NoneSave model to file
load(path)LinearSVCLoad model from file (static)

fit Parameters​

ParameterTypeDefaultDescription
featureslist[list[float]]Dense feature vectors
labelslist[str]Class labels
cfloat1.0Regularization parameter
max_iterint1000Maximum iterations
tolfloat0.1Convergence tolerance

Label​

A classification label with value and confidence score. Compatible with the underthesea API.

Usage​

from underthesea_core import Label

label = Label("positive", 0.95)
print(label.value) # "positive"
print(label.score) # 0.95

Constructor​

Label(value, score=1.0)

Parameters​

ParameterTypeDefaultDescription
valuestrLabel text
scorefloat1.0Confidence score (clamped to 0.0-1.0)

Properties​

PropertyTypeDescription
valuestrLabel text (read/write)
scorefloatConfidence score (read/write)

Sentence​

A text sentence with associated labels. Compatible with the underthesea API.

Usage​

from underthesea_core import Sentence, Label

sentence = Sentence("sản phẩm rất tốt")
sentence.add_label(Label("positive", 0.95))
print(sentence.text) # "sản phẩm rất tốt"
print(sentence.labels) # [positive (0.9500)]

Constructor​

Sentence(text="", labels=None)

Parameters​

ParameterTypeDefaultDescription
textstr""Sentence text
labelslist[Label]NoneInitial labels

Properties​

PropertyTypeDescription
textstrSentence text (read/write)
labelslist[Label]Associated labels (read/write)

Methods​

MethodReturnsDescription
add_label(label)NoneAdd a single label
add_labels(labels)NoneAdd multiple labels