Machine Translation
Overview​
The machine translation module in Underthesea provides bidirectional Vietnamese-English translation using the EnviT5 transformer model from VietAI.
Model: VietAI/envit5-translation
Requirements:
pip install "underthesea[deep]"
Architecture​
EnviT5 Translator​
Translation Pipeline
├── Text Input
│ └── Source language text
├── Preprocessing
│ └── Language prefix: "{lang}: {text}"
├── EnviT5 Model
│ ├── AutoTokenizer
│ └── AutoModelForSeq2SeqLM
│ └── Beam search (num_beams=5)
└── Output
└── Translated text
Model Configuration​
| Parameter | Value |
|---|---|
| Model | VietAI/envit5-translation |
| Architecture | T5 (Seq2Seq) |
| Beam search | num_beams=5 |
| Max length | 512 tokens |
| Languages | Vietnamese (vi), English (en) |
Usage​
Vietnamese to English (Default)​
from underthesea import translate
result = translate("Hà Nội là thủ đô của Việt Nam")
# 'Hanoi is the capital of Vietnam'
English to Vietnamese​
translate("I love Vietnamese food", source_lang='en', target_lang='vi')
# 'Tôi yêu ẩm thực Việt Nam'