Skip to main content
Version: Next 🚧

Classification

Overview​

This report covers two classification pipelines in Underthesea: Text Classification and Sentiment Analysis. Both use underthesea_core.TextClassifier and support multiple domains.


Text Classification​

The text classification module categorizes Vietnamese text into predefined categories. It supports a general news domain (10 categories) and a bank domain (14 categories), with an optional OpenAI prompt-based model.

Architecture​

Text Classification Pipeline
β”œβ”€β”€ Text Input
β”‚ └── Raw Vietnamese text
β”œβ”€β”€ Model Selection
β”‚ β”œβ”€β”€ General Domain (default)
β”‚ β”‚ └── underthesea_core.TextClassifier
β”‚ β”œβ”€β”€ Bank Domain (domain='bank')
β”‚ β”‚ └── underthesea_core.TextClassifier
β”‚ └── Prompt Model (model='prompt')
β”‚ └── OpenAI API
└── Output
└── List of predicted categories

Models​

ModelFileDescription
Generalsen-classifier-general-1.0.0-20260207.binVietnamese news classification
Banksen-bank-1.0.0-20260207.binBanking feedback classification
PromptOpenAI APILLM-based classification

Categories​

General Domain (10 categories)​

CategoryDescription
The thaoSports
Kinh doanhBusiness
Chinh tri Xa hoiPolitics & Society
Van hoaCulture
Khoa hocScience
Phap luatLaw
Suc khoeHealth
Doi songLifestyle
The gioiWorld
Vi tinhTechnology

Bank Domain (14 categories)​

CategoryDescription
ACCOUNTAccount management
CARDCard services
CUSTOMER_SUPPORTCustomer support
DISCOUNTDiscounts
INTEREST_RATEInterest rates
INTERNET_BANKINGInternet banking
LOANLoan services
MONEY_TRANSFERMoney transfers
OTHEROther topics
PAYMENTPayments
PROMOTIONPromotions
SAVINGSavings
SECURITYSecurity
TRADEMARKBrand-related

Usage​

from underthesea import classify

text = "HLV Δ‘αΊ§u tiΓͺn ở Premier League bα»‹ sa thαΊ£i sau 4 vΓ²ng Δ‘αΊ₯u"
category = classify(text)
# ['The thao']

# Bank domain
classify("LΓ£i suαΊ₯t tiαΊΏt kiệm quΓ‘ thαΊ₯p", domain='bank')
# ['INTEREST_RATE']

# Access labels
classify.labels # General domain labels
classify.bank.labels # Bank domain labels

Parameters​

ParameterTypeDefaultDescription
XstrrequiredText to classify
domainstrNoneDomain β€” None for general, 'bank' for banking
modelstrNoneModel type β€” None for default, 'prompt' for OpenAI

Sentiment Analysis​

The sentiment analysis module analyzes the sentiment of Vietnamese text. The general domain returns positive/negative/neutral classification. The bank domain provides aspect-based sentiment analysis.

Architecture​

Sentiment Analysis Pipeline
β”œβ”€β”€ Text Input
β”‚ └── Raw Vietnamese text
β”œβ”€β”€ Model Selection
β”‚ β”œβ”€β”€ General Domain (default)
β”‚ β”‚ └── underthesea_core.TextClassifier
β”‚ β”‚ └── 3-class: positive / negative / neutral
β”‚ └── Bank Domain (domain='bank')
β”‚ └── underthesea_core.TextClassifier
β”‚ └── Aspect-based sentiment
└── Output
β”œβ”€β”€ General: sentiment string
└── Bank: list of aspect#sentiment pairs

Models​

ModelFileDescription
Generalsen-sentiment-general-1.0.0-20260207.bin3-class sentiment
Banksen-sentiment-bank-1.0.0-20260207.binAspect-based sentiment

Sentiment Labels​

General Domain​

LabelDescription
positivePositive sentiment
negativeNegative sentiment
neutralNeutral sentiment

Bank Domain β€” Aspects​

AspectDescription
INTEREST_RATEInterest rate related
CUSTOMER_SUPPORTCustomer service quality
PRODUCTProduct/service quality
TRADEMARKBrand perception

Usage​

from underthesea import sentiment

text = "SαΊ£n phαΊ©m hΖ‘i nhỏ so vα»›i tưởng tượng nhΖ°ng chαΊ₯t lượng tα»‘t"
result = sentiment(text)
# 'positive'

# Bank domain
sentiment("LΓ£i suαΊ₯t quΓ‘ cao, nhΓ’n viΓͺn hα»— trợ tα»‘t", domain='bank')
# ['INTEREST_RATE#negative', 'CUSTOMER_SUPPORT#positive']

# Access labels
sentiment.labels # General domain labels
sentiment.bank.labels # Bank domain labels

Parameters​

ParameterTypeDefaultDescription
XstrrequiredText to analyze
domainstr'general'Domain β€” 'general' or 'bank'

References​

  1. Underthesea GitHub Repository