Classification
Overviewβ
This report covers two classification pipelines in Underthesea: Text Classification and Sentiment Analysis. Both use underthesea_core.TextClassifier and support multiple domains.
Text Classificationβ
The text classification module categorizes Vietnamese text into predefined categories. It supports a general news domain (10 categories) and a bank domain (14 categories), with an optional OpenAI prompt-based model.
Architectureβ
Text Classification Pipeline
βββ Text Input
β βββ Raw Vietnamese text
βββ Model Selection
β βββ General Domain (default)
β β βββ underthesea_core.TextClassifier
β βββ Bank Domain (domain='bank')
β β βββ underthesea_core.TextClassifier
β βββ Prompt Model (model='prompt')
β βββ OpenAI API
βββ Output
βββ List of predicted categories
Modelsβ
| Model | File | Description |
|---|---|---|
| General | sen-classifier-general-1.0.0-20260207.bin | Vietnamese news classification |
| Bank | sen-bank-1.0.0-20260207.bin | Banking feedback classification |
| Prompt | OpenAI API | LLM-based classification |
Categoriesβ
General Domain (10 categories)β
| Category | Description |
|---|---|
| The thao | Sports |
| Kinh doanh | Business |
| Chinh tri Xa hoi | Politics & Society |
| Van hoa | Culture |
| Khoa hoc | Science |
| Phap luat | Law |
| Suc khoe | Health |
| Doi song | Lifestyle |
| The gioi | World |
| Vi tinh | Technology |
Bank Domain (14 categories)β
| Category | Description |
|---|---|
| ACCOUNT | Account management |
| CARD | Card services |
| CUSTOMER_SUPPORT | Customer support |
| DISCOUNT | Discounts |
| INTEREST_RATE | Interest rates |
| INTERNET_BANKING | Internet banking |
| LOAN | Loan services |
| MONEY_TRANSFER | Money transfers |
| OTHER | Other topics |
| PAYMENT | Payments |
| PROMOTION | Promotions |
| SAVING | Savings |
| SECURITY | Security |
| TRADEMARK | Brand-related |
Usageβ
from underthesea import classify
text = "HLV ΔαΊ§u tiΓͺn α» Premier League bα» sa thαΊ£i sau 4 vΓ²ng ΔαΊ₯u"
category = classify(text)
# ['The thao']
# Bank domain
classify("LΓ£i suαΊ₯t tiαΊΏt kiα»m quΓ‘ thαΊ₯p", domain='bank')
# ['INTEREST_RATE']
# Access labels
classify.labels # General domain labels
classify.bank.labels # Bank domain labels
Parametersβ
| Parameter | Type | Default | Description |
|---|---|---|---|
X | str | required | Text to classify |
domain | str | None | Domain β None for general, 'bank' for banking |
model | str | None | Model type β None for default, 'prompt' for OpenAI |
Sentiment Analysisβ
The sentiment analysis module analyzes the sentiment of Vietnamese text. The general domain returns positive/negative/neutral classification. The bank domain provides aspect-based sentiment analysis.
Architectureβ
Sentiment Analysis Pipeline
βββ Text Input
β βββ Raw Vietnamese text
βββ Model Selection
β βββ General Domain (default)
β β βββ underthesea_core.TextClassifier
β β βββ 3-class: positive / negative / neutral
β βββ Bank Domain (domain='bank')
β βββ underthesea_core.TextClassifier
β βββ Aspect-based sentiment
βββ Output
βββ General: sentiment string
βββ Bank: list of aspect#sentiment pairs
Modelsβ
| Model | File | Description |
|---|---|---|
| General | sen-sentiment-general-1.0.0-20260207.bin | 3-class sentiment |
| Bank | sen-sentiment-bank-1.0.0-20260207.bin | Aspect-based sentiment |
Sentiment Labelsβ
General Domainβ
| Label | Description |
|---|---|
positive | Positive sentiment |
negative | Negative sentiment |
neutral | Neutral sentiment |
Bank Domain β Aspectsβ
| Aspect | Description |
|---|---|
| INTEREST_RATE | Interest rate related |
| CUSTOMER_SUPPORT | Customer service quality |
| PRODUCT | Product/service quality |
| TRADEMARK | Brand perception |
Usageβ
from underthesea import sentiment
text = "SαΊ£n phαΊ©m hΖ‘i nhα» so vα»i tΖ°α»ng tượng nhΖ°ng chαΊ₯t lượng tα»t"
result = sentiment(text)
# 'positive'
# Bank domain
sentiment("LΓ£i suαΊ₯t quΓ‘ cao, nhΓ’n viΓͺn hα» trợ tα»t", domain='bank')
# ['INTEREST_RATE#negative', 'CUSTOMER_SUPPORT#positive']
# Access labels
sentiment.labels # General domain labels
sentiment.bank.labels # Bank domain labels
Parametersβ
| Parameter | Type | Default | Description |
|---|---|---|---|
X | str | required | Text to analyze |
domain | str | 'general' | Domain β 'general' or 'bank' |