Skip to main content
Version: 9.2.11

Classification

Overview

This report covers two classification pipelines in Underthesea: Text Classification and Sentiment Analysis. Both use underthesea_core.TextClassifier and support multiple domains.


Text Classification

The text classification module categorizes Vietnamese text into predefined categories. It supports a general news domain (10 categories) and a bank domain (14 categories), with an optional OpenAI prompt-based model.

Architecture

Text Classification Pipeline
├── Text Input
│ └── Raw Vietnamese text
├── Model Selection
│ ├── General Domain (default)
│ │ └── underthesea_core.TextClassifier
│ ├── Bank Domain (domain='bank')
│ │ └── underthesea_core.TextClassifier
│ └── Prompt Model (model='prompt')
│ └── OpenAI API
└── Output
└── List of predicted categories

Models

ModelFileDescription
Generalsen-classifier-general-1.0.0-20260207.binVietnamese news classification
Banksen-bank-1.0.0-20260207.binBanking feedback classification
PromptOpenAI APILLM-based classification

Categories

General Domain (10 categories)

CategoryDescription
The thaoSports
Kinh doanhBusiness
Chinh tri Xa hoiPolitics & Society
Van hoaCulture
Khoa hocScience
Phap luatLaw
Suc khoeHealth
Doi songLifestyle
The gioiWorld
Vi tinhTechnology

Bank Domain (14 categories)

CategoryDescription
ACCOUNTAccount management
CARDCard services
CUSTOMER_SUPPORTCustomer support
DISCOUNTDiscounts
INTEREST_RATEInterest rates
INTERNET_BANKINGInternet banking
LOANLoan services
MONEY_TRANSFERMoney transfers
OTHEROther topics
PAYMENTPayments
PROMOTIONPromotions
SAVINGSavings
SECURITYSecurity
TRADEMARKBrand-related

Usage

from underthesea import classify

text = "HLV đầu tiên ở Premier League bị sa thải sau 4 vòng đấu"
category = classify(text)
# ['The thao']

# Bank domain
classify("Lãi suất tiết kiệm quá thấp", domain='bank')
# ['INTEREST_RATE']

# Access labels
classify.labels # General domain labels
classify.bank.labels # Bank domain labels

Parameters

ParameterTypeDefaultDescription
XstrrequiredText to classify
domainstrNoneDomain — None for general, 'bank' for banking
modelstrNoneModel type — None for default, 'prompt' for OpenAI

Sentiment Analysis

The sentiment analysis module analyzes the sentiment of Vietnamese text. The general domain returns positive/negative/neutral classification. The bank domain provides aspect-based sentiment analysis.

Architecture

Sentiment Analysis Pipeline
├── Text Input
│ └── Raw Vietnamese text
├── Model Selection
│ ├── General Domain (default)
│ │ └── underthesea_core.TextClassifier
│ │ └── 3-class: positive / negative / neutral
│ └── Bank Domain (domain='bank')
│ └── underthesea_core.TextClassifier
│ └── Aspect-based sentiment
└── Output
├── General: sentiment string
└── Bank: list of aspect#sentiment pairs

Models

ModelFileDescription
Generalsen-sentiment-general-1.0.0-20260207.bin3-class sentiment
Banksen-sentiment-bank-1.0.0-20260207.binAspect-based sentiment

Sentiment Labels

General Domain

LabelDescription
positivePositive sentiment
negativeNegative sentiment
neutralNeutral sentiment

Bank Domain — Aspects

AspectDescription
INTEREST_RATEInterest rate related
CUSTOMER_SUPPORTCustomer service quality
PRODUCTProduct/service quality
TRADEMARKBrand perception

Usage

from underthesea import sentiment

text = "Sản phẩm hơi nhỏ so với tưởng tượng nhưng chất lượng tốt"
result = sentiment(text)
# 'positive'

# Bank domain
sentiment("Lãi suất quá cao, nhân viên hỗ trợ tốt", domain='bank')
# ['INTEREST_RATE#negative', 'CUSTOMER_SUPPORT#positive']

# Access labels
sentiment.labels # General domain labels
sentiment.bank.labels # Bank domain labels

Parameters

ParameterTypeDefaultDescription
XstrrequiredText to analyze
domainstr'general'Domain — 'general' or 'bank'

References

  1. Underthesea GitHub Repository