Skip to main content
Version: 9.2.11

API Reference

This section provides complete API documentation for all Underthesea functions.

Core Functions​

FunctionDescriptionInstall
sent_tokenizeSentence segmentationCore
text_normalizeText normalizationCore
word_tokenizeWord segmentationCore
pos_tagPart-of-speech taggingCore
chunkPhrase chunkingCore
nerNamed entity recognitionCore
classifyText classificationCore
sentimentSentiment analysisCore
convert_addressAddress conversion (63→34 provinces)Core

Deep Learning Functions​

FunctionDescriptionInstall
dependency_parseDependency parsing[deep]
translateVietnamese-English translation[deep]

Additional Functions​

FunctionDescriptionInstall
lang_detectLanguage detection[langdetect]
ttsText-to-speech[voice]
agentConversational AI agent[agent]

Quick Import​

All main functions can be imported directly from underthesea:

from underthesea import (
sent_tokenize,
text_normalize,
word_tokenize,
pos_tag,
chunk,
ner,
classify,
sentiment,
dependency_parse, # requires [deep]
translate, # requires [deep]
lang_detect, # requires [langdetect]
agent, # requires [agent]
convert_address,
)

Common Parameters​

Many functions share common parameters:

format​

Controls output format:

  • None (default): Returns a list
  • "text": Returns a string with underscores joining multi-word tokens
word_tokenize("Việt Nam", format=None)   # ['Việt Nam']
word_tokenize("Việt Nam", format="text") # 'Việt_Nam'

model​

Specifies which model to use:

# Use default model
ner("text")

# Use specific model
ner("text", deep=True) # Use deep learning model
classify("text", model='prompt') # Use OpenAI model

domain​

Specifies the domain for domain-specific models:

classify("text", domain='bank')
sentiment("text", domain='bank')