Named Entity Recognition
Overview​
The NER module in Underthesea identifies and classifies named entities in Vietnamese text, supporting both a lightweight CRF model and a deep learning Transformers model. Entities are classified into persons (PER), locations (LOC), and organizations (ORG).
Architecture​
Dual Model Support​
NER Pipeline
├── Text Input
│ └── Raw Vietnamese text
├── Mode Selection
│ ├── Shallow (default)
│ │ ├── word_tokenize()
│ │ ├── pos_tag()
│ │ ├── chunk()
│ │ └── CRF NER Model
│ └── Deep (deep=True)
│ └── HuggingFace Transformers
│ └── undertheseanlp/vietnamese-ner-v1.4.0a2
└── Output
└── Entity annotations (BIO format)