项目描述

Uplug is a collection of tools for linguistic
corpus processing, word alignment, and term
extraction from parallel corpora. Several tools
have been integrated in Uplug. Pre-processing
tools include a sentence splitter, tokenizer, and
external part-of-speech tagger and shallow
parsers. The following external tools are used:
the Grok system for English (tagging and chunking)
and the morphological analyzer ChaSen for
Japanese. Other tools such as the TreeTagger can
easily be added. Translated documents can be
sentence aligned using the length-based approach
by Gale & Church. Words and phrases can be aligned
using the clue alignment approach and the toolbox
for training statistical alignment models GIZA++.

(This Description is auto-translated) Try to translate to Japanese Show Original Description

Your rating
Review this project