COLT
(The Bergen
Corpus of London Teenage Language)
BNC
(British
National Corpus): 100 million words of modern English. For information on a
concordance program for BNC, see SARA
(SGML-Aware Retrieval Application).
Link
Grammar An original theory of syntax related to dependency grammar. This page
includes a parser for English. Given a sequence of words, the system
determines its grammaticality and extracts its syntatic structure.
Corpus
Linguistics Course (Intended to supplement the book "Corpus
Linguistics" , written by Tony McEnery and Andrew Wilson, published by
Edinburgh University Press).
Linguistic
Data Consortium
The
Linguistic Data Consortium distributes many data resources: text databases,
lexicons and tools as well as speech corpora
ACL SIGLEX
(provides an umbrella for a variety of research interests ranging from
lexicography and the use of on-line dictionaries to computational lexical
semantics)
ELRA
(European Language
Resources Association) ELRA provides a centralized organization for the validation, management,
and distribution of speech, text, and terminology resources and tools, and
promotes their use within the European telematics R&TD community.
ELSNET
(The European Network in Language and Speech. This has a particularly
useful listing of newspaper corpora)
Tim
Johns - Tim
Johns Data-driven Learning Page
(Particularly useful for those interested in the use of corpora in English
language teaching, as it contains a large archive of the results of students'
work)