Spanish Spanish

Granada University
ADA

This software belongs to the ADELEX Research Project
ADELEXline.gif (1027 bytes)

HOME  FREQUENCIES  SENTENCE LENGTH  LEXICAL PROFILE  FORMULAIC PROFILE  TEXTS EVALUATION FOR SEC. AND BACH.   
FORMULAIC PROFILE

The vocabulary of a language is made up of items which may be individual (as in the traditional sense of ‘word’) or multiword units (MWUs), i.e. pre-fabricated chunks which represent single choices for the speaker, such as run out of, on the one hand, up to date o once again). To accurately measure the size and sophistication of the vocabulary in a text, we need to count the frequency of both individual words and MWUs, in a Formulaic Frequency Profile. Similar to the Lexical Frequency Profile, the Formulaic Profile can be defined as the percentage of words and MWUs in a text belonging to different frequency levels. It is obtained by comparing the given text with a number of frequency lists (or bands) to obtain, as a result, the percentage of lexical units (individual and multiword) that are included in each list.

ADELEX ANALYSER establishes the Formulaic Profile of written texts in English on the basis of two frequency lists:

1) a 7,000-word count of individual items, drawn from the British National Corpus, the Bank of English and the Longman Corpus Network databases (López-Mezquita, 2007)

2) a list with 456 of the MWUs included in the Phrasal List created by Ron Martínez & Norbert Schmitt (2012). The Phrasal List is a count of the 505 pre-fabricated chunks which occur at least 787 times in the BNC. According to their identification criteria, this list consists of semantically transparent units which are retrieved from memory as single items by language users. Because compositionality (i.e. transparency in meaning) was one of the criteria used, the list contains all types of MWUs (mainly phrasal verbs, collocations and multiword adjectives, determiners, prepositions and adverbs) except for idioms.

 

Considering the raw frequency of both individual words and MWUs in the BNC, ADELEX ANALYSER creates a formulaic profile based on 7 bands:

  • Band 1: from word no. 1 to word no. 1,000 + 36 MWUs
  • Band 2: from word no. 1,001 to word no. 2,000 + 64 MWUs
  • Band 3: from word no. 2,001 to word no. 3,000 + 65 MWUs
  • Band 4: from word no. 3,001 to word no. 4,000 + 96 MWUs
  • Band 5: from word no. 4,001 to word no. 5,000 + 90 MWUs
  • Band 6: from word no. 5,001 to word no. 6,000 + 58 MWUs
  • Band 7: from word no. 6,001 to word no. 7,000 + 47 MWUs

 

Given that at the present stage ADELEX ANALYSER cannot distinguish between the different meanings of a string of words, some of the units which will be counted as formulaic expressions may in fact be so in some contexts but not in others.

Example: ‘to me’ will be correctly counted as a single unit if it is used as a discourse marker (e.g. ‘To me, this is a mistake’), but will be incorrectly counted as a unit if it is in fact an indirect object as in ‘she gave it to me’.

 

Other examples include so far, after all, out of, in part y to be going to. In cases like these, the user is expected to go through the expressions retrieved in order to judge the accuracy of the results obtained.

In view of this, and for the sake of accuracy in the lexical profiling, those headwords from the original Phrasal List that were considered more likely to be used as free combinations of words than in their formulaic sense (in texts either written by learners or intended for their reading) were left out of the ADELEX ANALYSER database. Some of the 49 entries excluded are you see, and all that, or something, I mean y can tell. As most of these expressions are common in spoken language but rarely used in the written form their exclusion seems well justified.

INSTRUCTIONS:
Enter the text. You can type, paste or upload it. If you choose this last option it must be a txt file.
Text:

File
Printable version
LISTS  Base1Base2Base3Base4Base5Base6Base7
Select 

   


inped@ugr.es
Copyright©2000-2017