Version: Next 🚧

API Reference

This section provides complete API documentation for all Underthesea functions.

Core Functions

Function	Description	Install
`sent_tokenize`	Sentence segmentation	Core
`text_normalize`	Text normalization	Core
`word_tokenize`	Word segmentation	Core
`pos_tag`	Part-of-speech tagging	Core
`chunk`	Phrase chunking	Core
`ner`	Named entity recognition	Core
`classify`	Text classification	Core
`sentiment`	Sentiment analysis	Core
`convert_address`	Address conversion (63→34 provinces)	Core

Deep Learning Functions

Function	Description	Install
`dependency_parse`	Dependency parsing	`[deep]`
`translate`	Vietnamese-English translation	`[deep]`

Additional Functions

Function	Description	Install
`lang_detect`	Language detection	`[langdetect]`
`tts`	Text-to-speech	`[voice]`
`agent`	Conversational AI agent	`[agent]`

Quick Import

All main functions can be imported directly from underthesea:

from underthesea import (
    sent_tokenize,
    text_normalize,
    word_tokenize,
    pos_tag,
    chunk,
    ner,
    classify,
    sentiment,
    dependency_parse,  # requires [deep]
    translate,         # requires [deep]
    lang_detect,       # requires [langdetect]
    agent,             # requires [agent]
    convert_address,
)

Common Parameters

Many functions share common parameters:

`format`

Controls output format:

None (default): Returns a list
"text": Returns a string with underscores joining multi-word tokens

word_tokenize("Việt Nam", format=None)   # ['Việt Nam']
word_tokenize("Việt Nam", format="text") # 'Việt_Nam'

`model`

Specifies which model to use:

# Use default model
ner("text")

# Use specific model
ner("text", deep=True)  # Use deep learning model
classify("text", model='prompt')  # Use OpenAI model

`domain`

Specifies the domain for domain-specific models:

classify("text", domain='bank')
sentiment("text", domain='bank')

Core Functions​

Deep Learning Functions​

Additional Functions​

Quick Import​

Common Parameters​

format​

model​

domain​