Corpus

A corpus is a digitized collection of text data that is used for computer-aided processing of natural language. For example, machine translation uses aligned, bilingual parallel corpora as the basis for statistical analysis and MT engine training.