API Reference
This section provides complete API documentation for all Underthesea functions.
Core Functions​
| Function | Description | Install |
|---|---|---|
sent_tokenize | Sentence segmentation | Core |
text_normalize | Text normalization | Core |
word_tokenize | Word segmentation | Core |
pos_tag | Part-of-speech tagging | Core |
chunk | Phrase chunking | Core |
ner | Named entity recognition | Core |
classify | Text classification | Core |
sentiment | Sentiment analysis | Core |
convert_address | Address conversion (63→34 provinces) | Core |
Deep Learning Functions​
| Function | Description | Install |
|---|---|---|
dependency_parse | Dependency parsing | [deep] |
translate | Vietnamese-English translation | [deep] |
Additional Functions​
| Function | Description | Install |
|---|---|---|
lang_detect | Language detection | [langdetect] |
tts | Text-to-speech | [voice] |
agent | Conversational AI agent | [agent] |
Quick Import​
All main functions can be imported directly from underthesea:
from underthesea import (
sent_tokenize,
text_normalize,
word_tokenize,
pos_tag,
chunk,
ner,
classify,
sentiment,
dependency_parse, # requires [deep]
translate, # requires [deep]
lang_detect, # requires [langdetect]
agent, # requires [agent]
convert_address,
)
Common Parameters​
Many functions share common parameters:
format​
Controls output format:
None(default): Returns a list"text": Returns a string with underscores joining multi-word tokens
word_tokenize("Việt Nam", format=None) # ['Việt Nam']
word_tokenize("Việt Nam", format="text") # 'Việt_Nam'
model​
Specifies which model to use:
# Use default model
ner("text")
# Use specific model
ner("text", deep=True) # Use deep learning model
classify("text", model='prompt') # Use OpenAI model
domain​
Specifies the domain for domain-specific models:
classify("text", domain='bank')
sentiment("text", domain='bank')