La evolución de la informática ha hecho posible en los últimos años el acceso y la manipulación computerizada tanto de textos escritos como de transcripciones de diálogos con una rapidez, fiabilidad y facilidad impensables hasta hace poco, dando lugar a la llamada Lingüística de Corpus. Gracias al enorme valor representativo del comportamiento de la lengua que, por su volumen, tienen estas bases informatizadas de datos lingüísticos y, especialmente, a las posibilidades que ofrecen los programas diseñados para su estudio, los corpus informatizados se han convertido en un instrumento esencial para la descripción de la lengua. Este Grupo de Investigación pretende fomentar el uso de estos recursos informáticos para llevar a cabo investigaciones en los más diversos campos que tradicionalmente abarca la Lingüística.
    Los aspectos más importantes que se pretende abarcar son:
   i) Corpus informatizados: definición, modelos de córpora y fundamentos teóricos.
   ii) Diseño y elaboración del corpus: compilación procesamiento y anotación.
   iii)
Anotación de córpora: anotación automática, niveles de anotación y explotación de córpora anotados.
  iv)
El corpus informatizado y sus aplicaciones para la descripción lingüística.



 

1. An introduction to corpus linguistics

Aijmer, K. & B. Altenberg 1991.  “Introduction”. En K. Aijmer & B. Altenberg (eds.) 1991. English Corpus Linguistics. London: Longman; 1-6.

Ball, C.N. 1996.”Tutorial: concordances and corpora”. Text in electronic format in http://www.georgetown.edu/cball/corpora/tutorial.html; introduction.

Church, K.W. & R.L. Mercer 1994.”Introduction to the special issue on computational linguistics using large corpora”. In S. Armstrong (ed.), 1994. Using Large Corpora. Cambridge, Massachusetts: The M.I.T. Press; 1-24.

Johansson, S. 1991.”Times change, and so do corpora”. En K. Aijmer & B. Altenberg (eds.) 1991. English Corpus Linguistics. London: Longman; 305-314.

2. Corpus design and compilation

Cheng-yu, F. 1993.”Building a corpus of the English of computer science”. In J. Aarts, P. de Haan & N. Oostdijk (eds.) 1993. English Language Corpora: Design, Analysis and Exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992. Amsterdam: Rodopi; 73-78.

Clear, J. 1992.”International Corpus of English: Corpus design C problems and suggested solutions”. In G. Leitner (ed.) 1992. New Directions in English Language Corpora. Methodology, Results, Software Developments. Berlin: Mouton de Gruyter; 21-31.

3. Corpus annotation

Eyes, E. & G. Leech 1993.”Progress in UCREL research: Improving corpus annotation practices”. En J. Aarts, P. de Haan & N. Oostdijk (eds.) 1993. English Language Corpora: Design, Analysis and Exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992. Amsterdam: Rodopi; 123-143.

Garside, R. & G. Leech 1982.”Grammatical tagging of the LOB corpus: general survey”. In S. Johansson (ed.) 1982. Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities; 110-117.

Leech, G. 1991.”The state of the art in corpus linguistics”. En K. Aijmer & B. Altenberg (eds.) 1991. English Corpus Linguistics. London: Longman; 8-29.

Sampson, G. 1995. English for the Computer: the SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon; 1-18, 79-86, 437-456.

 4. A review of some corpora

 Aijmer, K. & B. Altenberg 1991.”Appendix: Some computerized English text corpora”. En K. Aijmer & B. Altenberg (eds.) 1991. English Corpus Linguistics. London: Longman; 315-318.

Johansson, S., E. Atwell, R. Garside & G. Leech 1986. The Tagged LOB Corpus. User’s Manual. Bergen: Norwegian Computing Centre for the Humanities; 1-8, 10-21.

Rundell, M. 1996.”The corpus of the future and the future of the corpus. Talk at Exeter, special conference on >New Trends in Reference Science’“. Text in electronic format in http://www.ruf.rice.edu/~barlow/futcrp.html

 Referencias bibliográficas adicionales

 Creación, descripción y explotación de córpora informatizados

 Aarts, J. & W. Meijs (eds.) 1990. Theory and Practice in Corpus Linguistics. Amsterdam: Rodopi.

Aarts, J., P. de Haan & N. Oostdijk (eds.) 1993. English Language Corpora: Design, Analysis and Exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992. Amsterdam: Rodopi.

Aijmer, K. & B. Altenberg (eds.) 1991. English Corpus Linguistics. London: Longman.

Armstrong, S. (ed.) 1994. Using Large Corpora. Cambridge, Massachusetts: The M.I.T. Press.

Butler, C.S. 1992. Computers and Written Texts. Oxford: B. Blackwell.

Church, K., P. Isabelle & D. Yarowsky (eds.) (forthcoming). Studies in Very Large Corpora. Dordrecht: Kluwer.

Cringeley, R.X. 1996. Accidental Empires. Harmondsworth: Penguin Books.

Fries, U., V. Müller & P. Schneider (eds.) 1997. From Ælfric to The New York Times: Studies in English Corpus Linguistics. Amsterdam: Rodopi.

Johansson, S. (ed.) 1982. Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities.

Johansson, S. & A.-B. Stenström (eds.) 1991. English Computer Corpora: Selected Papers and Bibliography. Boston: Mouton de Gruyter.

Johns, T. & P. King (eds.) 1991. Classroom Concordancing. Special issue of ELR Journal, University of Birmingham.

Kytö, M., O. Ihalainen & M. Rissanen (eds.) 1988. Corpus Linguistics, Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi.

Kytö, M., M. Rissanen & M. Palander-Collin (eds.) 1993. Early English in the Computer Age: Explorations through the Helsinki Corpus. Berlin: Mouton de Gruyter.

Kytö, M., M. Rissanen & S. Wright (eds.) 1994. Corpora across the Centuries: Proceedings of the First International Colloquium on English Diachronic Corpora. St. Catherine’s College, Cambridge, 25-27 March 1993. Amsterdam: Rodopi.

Llisterri, J. 1994. Informe sobre recursos lingüísticos para el español (I). Corpus escritos y orales disponibles y en desarrollo en España. Alcalá de Henares, Madrid: Instituto Cervantes.

Llisterri, J. 1996. Informe sobre recursos lingüísticos para el español (II). Corpus escritos y orales disponibles y en desarrollo en España. Alcalá de Henares, Madrid: Instituto Cervantes.

McEnery, T. & A. Wilson 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.

Mullings, C. et al. (eds.) 1996. New Technologies for the Humanities. London: Bowker Saur.

Oostdijk, N. 1991. Corpus Linguistics and the Automatic Analysis of English. Amsterdam: Rodopi.

Oostdijk, N. & P. de Haan (eds.) 1994. Corpus-Based Research into Language. Amsterdam: Rodopi.

Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Sinclair, J. 1997. Reading Concordances: an Introduction. London: Addison-Wesley Longman.

Stubbs, M. 1996. Text and Corpus Analysis: Computer Assisted Studies of Language and Culture. Oxford: Blackwell.

Svartvik, J. (ed.) 1992. Directions in Corpus Linguistics: Proceedings of the Nobel Symposium 82, Stockholm, 4-8 August 1991. Berlin: Mouton de Gruyter.

Thomas, J. & M. Short (eds.) 1996. Using Corpora for Language Research: Studies in Honour of Geoffrey Leech. London: Longman.

Tribble, C. & G. Jones 1990. Concordances in the Classroom: A Resource Book for Teachers. London: Longman.

 Manuales de uso y descripción de córpora concretos

 Burnard, L. (ed.) 1995. User’s reference Guide for the British National Corpus. Version 1.0. Oxford: Oxford University Computing Services.

Francis, W.N. 1964. Manual of Information to Accompany a Standard Sample of Present-Day Edited American English, for Use with Digital Computers. Providence, Rhode Island: Department of Linguistics, Brown University.

Johansson, S., G. Leech & H. Goodluck 1978. Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Department of English, University of Oslo.

Johansson, S., E. Atwell, R. Garside & G. Leech 1986. The Tagged LOB Corpus. User’s Manual. Bergen: Norwegian Computing Centre for the Humanities.

Knowles, G., A., Wichmann & P. Alderson (eds.) 1996. Working with Speech: Perspectives on Research into the Lancaster/IBM Spoken English Corpus. New York: Addison Wesley Longman.

Kytö, M. 19932. Manual to the Diachronic Part of the Helsinki Corpus of English Texts: coding conventions and lists of source texts. Helsinki: Department of English, University of Helsinki.

Sampson, G. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon Press.

Sinclair, J. 1987. Looking Up. An Account of the COBUILD Project. London: William Collins.

 Aspectos computacionales de los córpora informatizados

 Edwards, J.A. & M.D. Lampert (eds.) 1993. Talking Data: Transcription and Coding in Discourse Research. Hillsdale, New Jersey: L. Erlbaum.

Fries, U., G. Tottie & P. Schneider (eds.) 1994. Creating and Using English Language Corpora. Papers from the Fourteenth International Conference on English Language Research on Computerized Corpora, Zürich 1993. Amsterdam: Rodopi.

Garside, R., G. Leech & G. Sampson (eds.) 1987. The Computational Analysis of English: A Corpus-Based Approach. New York: Longman.

Goldfarb, C.F. 1990. The SGML Handbook. Oxford: Clarendon.

Herwijnen, E. v. 19942. Practical SGML. Dordrecht: Kluwer.

Ide, N. & J. Véronis (eds.) 1995. The Text Encoding Initiative: Background and Context. Dordrecht: Kluwer.

Leech, G., G. Myers & J. Thomas (eds.) 1995. Spoken English on Computer: Transcription, Mark-up and Application. London: Longman.

Leitner, G. (ed.) 1992. New Directions in English Language Corpora. Methodology, Results, Software Developments. Berlin: Mouton de Gruyter.

Ramsay, A. 1990. The Logical Structure of English: Computing Semantic Content. London: Pitman.

Ritchie, G.D. et al. 1992. Computational Morphology: Practical Mechanisms for the English Lexicon. Cambridge, Massachusetts: The M.I.T. Press.

Souter C. & E. Atwell (eds.) 1993. Corpus-Based Computational Linguistics. Amsterdam: Rodopi.

Sperberg-McQueen, C.M. & L. Burnard (eds.) 1994. Guidelines for the Encoding and Interchange of Electronic Texts. Chicago and Oxford: ACH, ACL, ALLC.