Corpus Linguistics

Corpus linguistics is the empirical study of language as it occurs naturally, and not as is prescribed by theoretical rules and structures. Corpus linguistics uses corpora, or empirical collections of written and/or spoken text, to discern naturally occurring patterns and features of language use. Corpus based research is particularly useful in the study of language acquisition, as corpora derived from the speech of children or students at various points in their development discloses essential details of the language learning processes. Corpus linguistics as practiced today, with the aid of automation and with the availability of large, comprehensive corpora, is a booming field that researchers predict will continue to dominate research on language in the decades to come. Language pedagogy has been and will continue to be profoundly affected by any developments in corpus linguistics, as empirical observations of language use are critical to formulating theories of language learning and teaching.

Keywords Collocation; Concordance; Corpus; Data driven learning; Discovery learning; Learner corpus; Lexicography

Overview

Introduction to Corpus Linguistics

Corpus linguistics refers to the empirical study of language as it occurs naturally in various contexts and under specific conditions. Corpus linguistics uses corpora, or empirical collections of written and/or spoken text, to discern naturally occurring patterns and features of language use. Corpus based research is particularly useful in the study of language acquisition, as corpora derived from the speech of children or students at various points in their development discloses essential details of the language learning processes.

A corpus is a large collection of text representative of a language or of a subset or genre of a language. Corpora are assembled by teams of researchers who select, categorize, and annotate text. This data is then sorted, parsed, and analyzed with the aid of computer programs-typically concordance programs and statistical packages. Concordances are lists of the occurrences of particular words or phrases in the corpus. Through concordance analysis, researchers can determine in which contexts a word, concept, or phrase is most prevalent, can compare the frequency and use of synonyms or similar ideas, and, with the help of statistical software, can characterize patterns of use.

Text in a corpus may be divided into any number of registers, or categories. Possible registers include texts written by various groups, texts of a specific genre, texts derived from speech, and so on (Biber et al, 1998). Through the use of registers, researchers can find and describe language patterns under various conditions and constraints. For example, differences in language use in news reporting, novels, and poems can be explored, vocabularies of natural language speakers and of second language learners can be compared, and so on.

Corpus based analyses have been used to develop dictionaries, to parse out and describe features of language, to derive new theories of grammar, and to forge teaching material that addresses language use, not only linguistic theory.

Corpus Linguistics in Context

Though the term 'corpus linguistics' has been coined only recently, in the second half of the twentieth century, all language studies before modern, Chomskyian linguistics were corpus based. As far back as the middle ages, monks created large tables and indices of phrases and passages from sacred texts to be used for further analysis (McEnery & Wilson, 2001). The study of lexicography-the study of the meaning and use of words-also took root during this period (Biber et al, 1998). Lexicography relied on measurements of the frequency of words and of the relation between words in various texts, or, on early linguistic corpus research.

During the eighteenth century, empirical language studies were used in understanding language acquisition and in creating language reference and learning materials. For example, in 1775, a corpus was used to provide samples of language use for dictionary words, and in the nineteenth century, a large compendium of texts was used to create the Oxford English Dictionary (Biber et al, 1998).

From about 1876 through 1926, corpus diary studies were the prominent methodology of gathering corpus data aimed at understanding language acquisition. Parents participating in studies kept detailed accounts of their children's utterances. These were later analyzed for patterns of normative behavior, and these diary studies corpora are still used at present as "sources of normative data" (McEnery & Wilson, 2001, p. 3).

In the early twentieth century, the empirical study of language took on a more formal shape with the birth of field linguistics and of the structuralism movement (McEnery & Wilson, 2001). Researchers in these traditions collected records of spoken language and later analyzed this corpora material in a 'bottom-up', procedural manner. The most commonly used study designs employed by field and structural linguists were large sample and longitudinal studies. Large sample studies, prevalent from around 1927 through 1957, drew from many students and language samples to determine and describe average language knowledge and usage. In longitudinal studies, popular since the early 1960s, researchers collect corpus data from the same participants over a period of time, and use this to describe changes in language acquisition and learning behaviors (McEnery & Wilson, 2001).

Chomsky & Rationalist Linguistics

Corpus based language studies were interrupted in the late 1950s by the research of Noam Chomsky (1928 - ), a computer scientist and linguist who ushered in a new wave of rationalistic linguistics and refuted the validity of using corpora to adequately represent language (1957). Chomsky argued that all empirical collections of language samples-all corpora-are skewed and incomplete. They are skewed in that they favor particular uses of language at the expense of others; for example, impolite, false, and obvious statements do not often find themselves in corpus collections (Biber et al, 1998). Further, corpora are incomplete because the number of sentences in a language is infinite; no finite collection of text could ever fully represent all possible configurations of words (McEnery & Wilson, 2001).

Corpus analysis thus lost its popularity during the 1950s and 60s, but resurfaced in the 1970s with the advent of powerful computing capabilities. The arguments leveled by Chomsky against corpus linguistics were addressed during this period, and by the early 1980s, large-scale corpus-building projects were undertaken by many universities and academic partnerships.

Corpus language research, after a dramatic struggle with rationalist theories, overcame Chomsky's challenges and transformed the newly forming field of linguistics. Supporters of corpus linguistics argued that natural language corpora provide key insights into language acquisition processes that cannot simply be theorized. They recognized corpora did not provide complete accounts of language use, but found corpus linguistics invaluable in research on language acquisition and on language pedagogies. Further, corpus research began to provide empirical evidence against purely structuralist, rationalist grammars. These grammars conceived of language use as a 'fill-in-the-slot' process in which appropriate words are fitted into preconceived, theoretically 'correct' sentence structures. Research found that on the contrary, language users rely on schemata and learned language collocations, or commonly used phrases, when engaging in authentic natural speech (Sinclair, 1991).

The successful resurfacing of corpus language research was enabled in the early 1970s by the introduction of the computer into the laboratory. Automated processing allowed for never before imagined storage and analysis capabilities. Researchers were now able to analyze the frequency with which words appeared across registers, the associations between words and common phrases, and the multiple meanings behind individual words (Biber et al, 1998).

Modern Corpus Research & Compiled Corpora

Researchers undertaking linguistic corpus research are able to investigate any feature of language, such as grammar, semantics, and pragmatics (McEnery & Wilson, 2001). However, the comprehensive study of any of these fields requires large, representative data sets from which empirical laws can be derived (Myles, 2005). Therefore, attempts at compiling large, comprehensive corpora have been underway since the inception of corpus linguistics. However, the compilation corpora are most often conducted alongside other research aims. Some of these include:

• Understanding descriptive grammars,

• Discourse analysis,

• Pragmatics, and

• Language acquisition.

Descriptive grammar is a new approach to studying grammar based on corpus research. Traditional, rationalist grammars are prescriptive-or, they dictate the ways in which words should be used. Descriptive grammars examine corpora of text derived from naturally occurring speech and detect grammatical rules used. The corpus approach to grammar was pioneered by Charles C. Fries (1887-1967), who compiled the first large corpus of spoken English by transcribing and annotating large numbers of taped phone conversations (Fries, 1952). Research in descriptive grammar has expanded and has been enriched by the availability of storing, sorting, and analysis technologies. Corpus text is processed using register analysis, or analysis that examines frequency, organization, and form of words and phrases as compared across many registers (Conrad, 2000). Information about language use across large numbers of registers adds nuance to descriptive grammar studies that acknowledge the grammar of each form of language is unique-for example, the ways in which individuals speak on subway trains are not the same as those by which they present themselves at a business meeting.

Several decades after C. C. Fries had begun his work, in 1959, Sir Randolph Quirk (1920 - ), a linguist from the University College of London, initiated the Survey of English Usage (SEU), a project to assemble a one million spoken and written (British) English language corpus. The spoken portion of the corpus was further annotated and computerized as the London-Lund corpus by Sidney Greenbaum and Jan Svartvik, two linguists trained through Quirk's SEU (Svartvik & Quirk, 1980).

The Spoken English Corpus

Another corpus to incorporate spoken text was the Spoken English Corpus, assembled between 1984 and 1987 at Lancaster University. The Spoken English Corpus was also the first machine-readable corpus and was used to analyze speech features such as dialects and intonation (Knowles et al, 1996). Spoken text, though invaluable as a source of information on actual language usage, is difficult to incorporate systematically into a corpus. Corpus programmers must first establish rules for how to encode the speech of overlapping speakers, how and if to code intonations, perceived emotions or tones, and for how to insert orthography into the transcription (McEnery & Wilson, 2001). However, if these issues are addressed fully, spoken corpora can be invaluable sources of information on how language is used in conversation and in relation to others.

The availability of spoken corpora allows researchers today to empirically observe the pragmatic aspects of language use and the features of discourse and speech acts (Jiang, 2005). In pragmatics, language is understood from the point of view of speakers and language users. These actors choose language based on constraints and limitations encountered, on situations presented, and on what they have previously learned about the way their language affects their relationships with others. From the point of view of pragmatics, language choices significantly affect further decisions, both of how to speak, and of how to act (Jiang, 2005).

Corpus Linguistics & Language Acquisition Studies

Corpus linguistics has been used as a tool in second language acquisition research since early in the twentieth century. By mid-twentieth century, language acquisition research was linked to the field of second language learning, and vocabulary lists used in language courses were increasingly pulled from natural speaker corpora (McEnery & Wilson, 2001).

The first corpus to specifically address second language acquisition concerns was the Collins Birmingham University International Language Database (COBUILD), founded in 1980 at the University of Birmingham and directed by John Sinclair (1933-2007), a British linguist (1991). COBUILD is a monitor, or open, corpus-it allows new text to be added continuously, and does not specify a maximum limit on the number of entries. The disadvantage of such a design is that it makes quantitative analyses less feasible; however, for purposes of second language pedagogies, it is advantageous for the corpus to be flexible, to be kept updated, and to increase its scope over time (McEnery & Wilson, 2001). COBUILD is used by language teachers in the creation of classroom materials, primarily through concordance analysis. Using concordances affords the teacher an accurate, up to date picture of how particular words are used in different contexts by fluent or native language speakers.

More recently, learner corpora have revolutionized the field of second language pedagogical research. Learner corpora are collections of text written or spoken by those not yet fluent in a language. In 1990, Sylviane Granger of the University of Louvain in Belgium initiated the collection of the first learner corpus-the International Corpus of Learner English (ICLE). ICLE consists of essays written by students of English from various countries and of various levels. Through analyzing learner corpora such as ICLE, researchers can observe the differences in language use between language learners and natural language speakers, and further between learners at various levels (Shirata & Stapleton, 2007). However, the scope of learner corpora at present is limited by the singular emphasis of written text.

The Visual Corpus

One group addressing this concern by collecting video taped data from language classrooms is led by Steve Reder, a linguist at Portland State University. Between 2001 and 2006, the ESL classes at the Portland State University Laboratory School were taped for a total of more than 5,000 hours of data collected from over 1,000 adult English learners. This data is the basis of the Multimedia Adult ESL Learner Corpus (MAELC), a corpus that can be used to find particular instances of explanations or questions and to immediately retrieve videos that illustrate the pedagogical methods used. MAELC visually captures the classroom experiences of adult students during the early stages of language learning, when gestures and visuals are generously used (Reder et al., 2003). A visual corpus allows researchers to examine classroom dynamics, to observe patterns of speech of different students, and to detect the non-verbal cues of language pedagogy.

Applications

Corpus Linguistics in Practice

Despite strong evidence that language learning occurs best in natural environments and with exposure to authentic language use, most ESL textbooks and other materials used in today's classrooms are based on rationalist, structural views of language (McEnery & Wilson, 2001). Second language learning is still primarily implemented through drills of vocabulary or of 'slot-filler' sentence completions, while typical language textbooks used at present are not context sensitive, for example. Because language is for the most part situation-dependent-language used in the home is different than that used in the workplace or that of the courthouse, the local bar, the academic journal, and so on-ignoring context in the teaching of a language prevents students from being able to acquire fluency (Barbieri & Eckhardt, 2007).

Rationalist vs. Natural Language

The rationalist-based methodologies of currently available language teaching material are inadequate in teaching discourse and communication skills, the appropriateness of language in various contexts, and proper idiom and collocation use (Shei & Pain, 2000; Basanta & Martin, 2005-6). Therefore researchers recommend that teachers familiarize themselves with existing corpora and experiment with designing their own corpus-derived lessons instead of relying on textbooks addressing prescriptive, hypothetical language use (Jiang, 2005). Teachers may do this through using corpus driven methodologies to emphasize the active role of the student in the exploration of language, the authenticity of natural language use, and the organic nature of language learning.

Grammar, for example, can be taught as it emerges from natural language use-it does not have to be presented as a set of rigid, inflexible rules (Barbieri & Eckhardt, 2007). Rules of grammar need not be made explicit-rather, the student should be guided to note features of a language and how these shift for themselves. The "open-choice principle" of rationalist linguistics, which postulates that grammar is a frame within which slots are filled with words, has been experimentally invalidated, while the "idiom principle", which postulates language learning and making occur within schemas and in the form of phrases and commonly encountered idioms, has gained acceptance in the field of language acquisition (Barnbrook, 2007).

Concordances & Collocations

Two features of language that have been empirically shown to affect language fluency and acquisition through familiarizing students with idioms-and that can be used by teachers in the design of classroom materials-are concordances and collocations. Concordance analysis discloses the different ways in which a word might be used; collocation analysis describes the relationships between words commonly found together. Research has found that native speakers have a broad knowledge of both concordances and collocations, and use these systematically and frequently (Shei & Pain, 2000). The automation of these tasks allows fluent speakers faster access to language constructs, which in turn leads to more clear and coherent speech.

Teachers must design curricula material with particular attention to concordance and collocation, especially as current textbooks and pedagogies emphasize one word drills or phrases out of context, thus handicapping learners by preventing them from learning to use words in context and from learning the ways in which words naturally co-locate (Shirata & Stapleton, 2007). Second language learners should be frequently exposed to complex collocations and a variety of concordances that are natural to fluent speakers in order to help them automate the retrieval of common phrases and language constructs and schemas (Shei & Pain, 2000).

Data Driven & Discovery Learning

Data driven learning and discovery learning are two language-teaching methodologies inspired by corpus studies that can be adapted to any classroom (Barbieri & Eckhardt, 2007). Data driven learning combines product and process learning in an inductive, active exploration of language (Johns, 1991). The student discovers authentic language by examining concordances from large corpora - or, by sifting through examples of language use in various, natural contexts. As such, the student is also a researcher, driven by a desire for the understanding of a language - the product, and enabled by the availability of an organic, flexible representation of language use-the process (Johns, 1991). Discovery learning is primarily process oriented, and emphasizes open-ended exploration of corpus text and 'serendipitous' discoveries (Bernardini, 2000).

Speech Acts

Corpus research is also used in the teaching of speech acts (Jiang, 2005; Basanta & Martin, 2005-6). Speech acts require both sociolinguistic and socio-cultural knowledge, or the ability to both form appropriate verbalizations and to apply these in the appropriate circumstances. These pragmatic features of language exchanges help the learner gain "communicative competence" (Jiang, 2005, p. 37) that gives students the confidence to use language in different ways. As other aspects of language, spoken forms of discourse and conversation exist as schemas that should be incorporated into language classrooms. Dialogues, for example, can be structured around commonly encountered forms within which students can find patterns and derive understandings of language independently (Basanta & Martin, 2005-6).

Viewpoints

Future Directions of Corpus Linguistics & Language Studies

One criticism leveled at the use of corpus linguistics in language teaching is that the selection of corpus material in a classroom does not reflect a neutral view, and is in fact a product of the teacher or researcher's biases and preconceptions (Hammond & Macken-Horarik, 1999). Ultimately, the corpus or the teacher controls what information is available to students, thus blunting opportunities for critical thinking about the presented text. Proposed solutions to these criticisms include allowing the students to select the texts they are interested in studying, and encouraging teachers to critically examine corpus selections for personal and societal biases (Hammond & Macken-Horarik, 1999).

Despite criticisms such as these, corpus linguistics as practiced today, with the aid of automation and with the availability of large, comprehensive corpora, is a booming field that researchers predict will continue to dominate research on language in the decades to come (McEnery & Wilson, 2001; Biber et al, 1998). Furthermore, language pedagogy has been and will continue to be profoundly affected by any developments in corpus linguistics, as empirical observations of language use are critical to formulating theories of language learning and teaching.

Terms & Concepts

Collocation: The ways in which some words naturally occur alongside others, for example, "achieve" and "goal."

Concordance: A list of the occurrences of particular words or phrases in the corpus. Through concordance analysis, researchers can determine in which contexts a word, concept, or phrase is most prevalent, can compare the frequency and use of synonyms or similar ideas, and, with the help of statistical software, can characterize patterns of use.

Corpus: Empirical collection of representative written and/or spoken text that is annotated and sorted into categories (registers).

Data Driven Learning: A pedagogy that combines product and process learning in an inductive, active exploration of language The student is encouraged to discover authentic language through examination of concordances from large corpora.

Discovery Learning: A process oriented pedagogical method that emphasizes open-ended exploration of corpus text and 'serendipitous' discoveries.

Learner Corpus: Collections of text written or spoken by those not yet fluent in a language that can be used as comparison to native speaker corpora in the design of curriculum materials.

Lexicography: The study of the meaning and use of words that took root with the first corpus studies of language in pre-modern times. Empirical lexicography relies on measurements of the frequency of words and of the relation between words in various texts.

Bibliography

Barbieri, F. & Eckhardt, S. (2007). Applying corpus-based findings to form-focused instruction: The case of reported speech. Language Teaching Research, 11 , 319-346. Retrieved December 20, 2007 from EBSCO online database, Academic Search Premier: http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=25796265&loginpage=Login.asp&site=ehost-live

Barnbrook, G. (2007). Sinclair on collocation. International Journal of Corpus Linguistics, 12 , 183-199.

Basanta, C., & Martin, M. (2005/6). The application of data-driven learning to small-scale corpus of conversational texts from the BNC-British National Corpus. International Journal of Learning, 12 , 183-192. Retrieved December 22, 2007 from EBSCO online database, Education Research Complete: http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=25169947&loginpage=Login.asp&site=ehost-live

Belz, J. (2004). Learner corpus analysis and the development of foreign language proficiency. System, 32 , 577-591.

Bernardini, S. (2000). Systematising serendipity: Proposals for concordancing large corpora with language learners. In L. Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from a Corpus Perspective (pp. 225-234). New York: Peter Lang.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. New York: Longman.

Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.

Conrad, S. (2000). Will Corpus Linguistics Revolutionize Grammar Teaching in the 21st Century? TESOL Quarterly, 34 , 548-560.

Dilin, L. (2013). Using Corpora to help teach difficult-to-distinguish English words. English Teaching, 68, 27-50. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90566961&site=ehost-live

Espada-Gustilo, L. (2011). Linguistic features that impact essay scores: A Corpus linguistic analysis of ESL writing in three proficiency levels. 3L: Southeast Asian Journal of English Language Studies, 17, 55-64. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=65915841&site=ehost-live

Fries, C. C. (1952). The Structure of English. New York: Harcourt Brace.

Grant, L. (2007). In a manner of speaking: Assessing frequent spoken figurative idioms to assist ESL/EFL teachers. System, 35 , 169-181.

Greenbaum, S. (1991). ICE: the International Corpus of English. English Today, 28, 3-7.

Greenbaum, S. (ed.). (1996). Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press.

Hammond, J., & Macken-Horarik, M. (1999). Critical Literacy: Challenges and Questions for ESL Classrooms. Tesol Quarterly, 33 , 528-544.

Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Jiang, X. (2006). Suggestions: What should ESL students know? System, 34 , 36-54.

Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. Classroom concordancing, ELR Journal, 4, 1-16.

Kayaoglu, M. (2013). The use of Corpus for close synonyms. Journal of Language & Linguistics Studies, 9, 128-144. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90596351&site=ehost-live

Kennedy, C. (1987). Expressing Temporal Frequency in Academic English. Tesol Quarterly, 21 , 69-86.

Kennedy, G. (1998). An Introduction to Corpus Linguistics. London: Longman.

Knowles, G., Williams, B., & Taylor, L. (1996). A corpus of formal British English speech: the Lancaster/IBM Spoken English Corpus. London: Longman.

Kramsch, C. (2000). Second language acquisition, applied linguistics, and the teaching of foreign languages. The Modern Language Journal, 84 , 311-326.

Marckwardt, A. (1968). Charles C. Fries. Language, 44, 205-210.

McEnery, T. & Wilson, A. (2001). Corpus Linguistics. Edinburgh: Edinburgh University Press.

Myles, F. (2005). Interlanguage corpora and second language acquisition research. Second Language Research, 21 , 373-391. Retrieved December 22, 2007 from EBSCO online database, Academic Search Premier: http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=18407682&loginpage=Login.asp&site=ehost-live

Oliver, R., & Mackey, A. (2003). Interactional context and feedback in child ESL classrooms. The Modern Language Journal, 87 , 519-533.

Quirk, R. (1968). The use of English. London: Longman.

Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1972). A grammar of contemporary English. London: Longman.

Quirk, R. & Greenbaum, S. (1973). A concise grammar of contemporary English. New York: Harcourt, Brace, Jovanovich.

Reder, S., Harris, K., & Setzler, K. (2003). The multimedia adult learner corpus. TESOL Quarterly, 37 , 546-557.

Shei, C.-C., & Pain, H. (2000). An ESL writer's collocational aid. Computer Assisted Language Learning, 13 , 167-182.

Shirato, J. & Stapleton, P. (2007). Comparing English vocabulary in a spoken learner corpus with a native in Japan speaker corpus: Pedagogical implications arising from an empirical study. Language Teaching Research, 11 , 393-412.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Svartvik, J., & Quirk, R. (eds.). (1980). A Corpus of English Conversation. Lund, Sweden: C. W. L. Gleerup.

Szmrecsanyi, B. (2005). Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory, 1 , 113-150.

Taylor, B. (1975). Adult language learning strategies and their pedagogical implications. TESOL Quarterly, 9 , 391-399.

Willig, A. (1985). A meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research, 55 , 269-317.

Suggested Reading

Barbieri, F. & Eckhardt, S. (2007). Applying corpus-based findings to form-focused instruction: The case of reported speech. Language Teaching Research, 11 , 319-346. Retrieved December 20, 2007 from EBSCO online database, Academic Search Premier:: http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=25796265&loginpage=Login.asp&site=ehost-live

Conrad, S. (2000). Will Corpus Linguistics Revolutionize Grammar Teaching in the 21st Century? TESOL Quarterly, 34 , 548-560.

Grant, L. (2007). In a manner of speaking: Assessing frequent spoken figurative idioms to assist ESL/EFL teachers. System, 35 , 169-181.

Hammond, J., & Macken-Horarik, M. (1999). Critical Literacy: Challenges and Questions for ESL Classrooms. Tesol Quarterly, 33 , 528-544.

Jiang, X. (2006). Suggestions: What should ESL students know? System, 34 , 36-54.

Kennedy, G. (1998). An Introduction to Corpus Linguistics. London: Longman.

Nelson, G. (2006). The core and periphery of world Englishes: a corpus-based exploration. World Englishes, 25, 115-129. Retrieved December 19, 2007 from EBSCO online database, Academic Search Premier: http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=19714390&loginpage=Login.asp&site=ehost-live

Quirk, R. (1968). The use of English. London: Longman.

Shei, C.-C., & Pain, H. (2000). An ESL Writer's Collocational Aid. Computer Assisted Language Learning, 13, 167-182.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Tsui, A. (2005). ESL Teachers' Questions and Corpus Evidence. International Journal of Corpus Evidence, 10 , 335-356. Retrieved December 22, 2007 from EBSCO online database, http://search.ebscohost.com/login.aspx?direct=true&db=ufh&AN=18859439&loginpage=Login.asp&site=ehost-live

Essay by Ioana Stoica

Ioana Stoica is a doctoral candidate in educational philosophy and policy studies at the University of Maryland, College Park, where she serves as the program coordinator for the Center for Undergraduate Research. Ioana also teaches dance and movement in the DC Public Schools and is an Associate Director for the Duncan Dancers of Washington. Previous to her doctoral work, she received B.S. degrees in mathematics and in electrical engineering, and worked on research and published in artificial intelligence and quantum physics.