|
|
|
||
| This Document | ||
| SummaryPlus | ||
| Full Text + Links | ||
| PDF (156 K) | ||
| Actions | ||
| Cited By | ||
| Save as Citation Alert | ||
| E-mail Article | ||
| Export Citation | ||
The language learner as language researcher: putting corpus linguistics on the timetable
Winnie Cheng
,
, Martin Warren
and Xu Xun-feng
Department of English, The Hong Kong Polytechnic University, Hunghom, Hong Kong SAR
Received 30 October 2000; revised 25 November 2002; accepted 10 February 2003. ; Available online 11 April 2003.
This paper describes an attempt to make room for the subject Corpus Linguistics on the already packed timetable of an English language major undergraduate programme. We describe the rationale for bringing together two existing subjects, Information Technology and Discourse Analysis, for a period of time in order to more systematically and meaningfully introduce students to corpus-based language study in combination with `data-driven learning' (Johns, T., 1991. Should you be persuaded: two samples of data-driven learning materials. In: Johns, T., King, P., (Eds.), Classroom Concordancing (English Language Research Journal 4). ELR, Birmingham, pp. 1–16.). The contents of the course are given, examples of the students' research studies described, along with an examination of the reactions of students and teachers to this learning and teaching development. We have found that it is both possible and worthwhile to add corpus linguistics to the curriculum. Both students and teachers felt that it enhanced the value of the two original subjects and fruitfully cast all of the participants in new roles.
Author Keywords: Information Technology; Discourse Analysis; Corpus linguistics; Data-driven learning; Corpus-driven research
The terms `corpora' and `corpus linguistics', which have become widely adopted terms among researchers of linguistics and applied linguistics, have yet to acquire commonly accepted meanings among English language teachers and students. Corpus-based language study dates back to the 1960s (e.g. Sinclair et al., 1969) and, due to its dependence on computer technology, has come into its own since the 1980s with the growth of multi-million word corpora.
Although a corpus can mean a body or collection of texts in any form, when linguists speak of a corpus today, they usually mean a collection of computer-readable texts compiled using a clearly delineated set of design criteria. Corpus linguistics involves the examination of linguistic phenomena through large collections of machine-readable texts. In other words, corpus linguistics is the study of language through corpus-based or corpus-driven research. A distinction has come to be drawn between `corpus-based' research and corpus-driven in corpus linguistics. The former is described as expounding or exemplifying existing theories not always based on corpus evidence (Tognini Bonelli, 2002, p. 74). In corpus-driven research, theoretical statements are a product of the evidence from the corpus ( Tognini Bonelli, 2002, p. 75). As Altenberg and Granger (2002, p. 15) point out, the difference essentially rests "in the importance attached to the initial assumptions and the role the data play in the analysis". In this paper, we will use the term corpus-driven to describe the studies undertaken.
Probably the first and best known exponent of corpus linguistics, and the author and editor of numerous related publications, is Sinclair, 1987; Sinclair, 1991; Sinclair, 1997; Sinclair, 2001a and Sinclair, 2001b. Sinclair initiated the COBUILD project (a joint project between the publisher HarperCollins and Birmingham University), with its many related publications, that led to the compilation of the largest corpus of any language––the 400 million word Bank of English housed at Birmingham University, UK. Corpus linguistics is now an established field with a growing body of researchers and exponents and has been the inspiration for new language learning methodologies. Another example of a more recent large scale corpus-driven language study that is impacting applied linguistics and English language teaching is the CANCODE project. This is a project involving Nottingham University and Cambridge University Press in the compilation of a five million word corpus of spoken English (e.g. McCarthy, 1998, pp. 8–23 for more details). More recently, in the United States of America, corpora, though of a smaller scale, have been compiled and used for linguistic and pragmatic analyses. An example is the 1.7 million word Michigan Corpus of Academic Spoken English ( Simpson et al., 2002).
This growth in corpus linguistics has resulted in the development of new language learning and teaching methodologies (e.g. Burnard and McEnery, 2000; Ghadessy et al., 2001; Hunston, 2002 and Granger, 1998). In the past 10 years or so, there have been a growing number of research publications in support of data-driven learning (DDL), showing how data from corpora can be used by students to further their language learning ( Tribble, 1997; Kettemann, 1995; Johns, 1991; Tribble and Jones, 1990; Wichmann et al., 1997 and Thurstun and Candlin, 1998). The originator of DDL is Tim Johns, formerly based at Birmingham University, UK, who believes that the language learner is at the same time a language researcher and that in order to more effectively learn the target language, the learner needs to be able to have available authentic linguistic data ( Johns, 1991). Johns coined the term DDL to describe this approach to language learning. Using corpora as the source of spoken and written texts, DDL brings to the class abundant examples of authentic language samples that can be studied and exploited in many ways. Such an approach usurps the traditional roles of the teacher/researcher and student because, as Johns (1991, p. 2) claims, "research is too serious to be left to the researchers". The teacher becomes a facilitator of language study instead of being seen as the language expert responsible for both teaching and research. The students acquire a new role as language investigator in addition to that of language learner.
The use of corpora in the learning and teaching of English as a second language has been reported elsewhere (e.g. Johns, 1989; Johns, 1997; Thurstun and Candlin, 1998; Flowerdew, 1998; Hunston, 2002 and Ghadessy et al., 2001). This paper, however, describes a project that more formally introduced English language corpora, corpus linguistics and data-driven learning through combining the teaching of two subjects, Discourse Analysis and Information Technology, to undergraduate English major students in Hong Kong.
Information Technology and Discourse Analysis are compulsory subjects for all second year students (N=29) of the BA (Hons) in Contemporary English Language (BACEL) offered by the English Department at the Hong Kong Polytechnic University. Information Technology concentrates on the students' mastery and understanding of the DOS and Windows operating systems, and using various other software packages. It also aims to give students a general introduction to some of the wider applications of IT in English language study and research. During this two-semester subject, the students spent 3 h a week in a computer laboratory, comprising 2 h of lecture and seminar discussion and 1 h of hands-on practise. By the time the students moved into the second semester of the subject, they were computer literate in terms of word processing and Web browsing. In addition, they had acquired the basic concepts and techniques of corpus linguistics, including corpus building and concordancing. With some competence, the students could use WordSmith Tools (Scott, 1999) and other concordancers to do concordancing on a variety of texts, both established language corpora of various sizes and some collections of texts they had compiled based on their own interests.
While studying the second of the two IT-based subjects, the students are also taking the subject Discourse Analysis. This subject runs for one 14-week semester and has three class contact hours per week made up of lectures and seminars. In this subject the students are introduced to the fundamentals of discourse analysis through the examination of spoken and written discourses and the study of extracts taken from various English language corpora, especially corpora of Hong Kong English (see, e.g. Cheng and Warren, 1999; Cheng and Warren, 2000 and Fan et al., 1999). These texts, drawn from a wide variety of genres, serve as examples and illustrations of how linguistic messages are constructed and interpreted and of the interplay between cohesion and coherence.
For some time our ambition had been to end the artificial segregation of the more technical IT-based aspects of corpus-driven studies from their discourse-related investigation and analysis of spoken and written texts by bringing together the two subjects for part of the semester. In doing so, we hoped to more formally and systematically introduce students to corpus linguistics as a means to study language use. This paper describes the ways in which this integration took place, the forms it took, some of the findings made by the students, the differences these studies made to students' learning and the reactions of both the students and the teachers to the differences, if any, that a corpus-driven approach made to their work.
We combined the delivery of our subjects for 2 weeks, enabling an intensive introduction to corpora, corpus linguistics and data-driven learning. The 2 week introduction aimed to examine both the IT-related aspects and their applications in the field of discourse analysis. The introduction included the following:
To help the BACEL students realize the practical benefits of corpus linguistics and data-driven learning, a number of well-established English language corpora were first introduced to them during the lectures and seminars. They included the Bank of English, Brown Corpus, LOB Corpus and the 100-million-word British National Corpus (BNC). The students also learned to explore the British English component of the International Corpus of English (ICE-GB), one million words made up of equal quantities of spoken and written texts that have all been fully tagged and parsed. Throughout this introductory phase, examples were given to the students of how to examine the corpora to better understand how language works and how language forms are used in different contexts.
A key tool in corpus linguistics is the concordancer which gives the researcher access to many important language patterns in texts. The students were shown the main ways in which the computer programmes known as concordancers allow the user to search through texts stored on computer and display all of the occurrences of a certain word or phrase in context. A number of concordancers and on-line concordancing facilities were formally introduced in class and made available in the computer laboratory where the students later carried out their own independent corpus-driven studies. For the ICE-GB corpus, the students also learned how to use ICE-CUP, the computer software supplied with corpus which includes facilities for concordancing. Having developed some familiarity with the software and the tag systems of the corpus, the students were able to use the concordancing facilities of ICE-CUP for the exploration of the texts with the help of the part-of-speech tags and the grammatical tags.
Such knowledge and familiarity with various corpora and concordancers helped the students gain a good understanding of what corpus linguistics is and what it can do or cannot do. However, acquisition of knowledge and skills of corpus linguistics are not an end in themselves. The students need to learn how they can make good use of them in their linguistic studies. As Leech (1992) points out, corpus linguistics is not "a domain of study", but rather "a methodological basis for pursuing linguistic research" ( Leech, 1992, p. 105). Only when methods of corpus linguistics are applied in areas of linguistic research can their strengths be demonstrated to the full. Here lies the major attraction of a combined approach to Information Technology and Discourse Analysis. In Information Technology students had acquired computer skills, and in Discourse Analysis they were learning how to analyze texts and compare text types. Through learning this new methodology for language study, the students were able to bring together and build on the knowledge they had gained from studying these two formerly discrete subjects.
However, we did not limit our introduction to corpus-driven language study solely in the field of discourse analysis. Rather, we introduced to the students the three main forms of study conducted by corpus linguists, for instance, lexical, syntactic and discoursal (Wilson, 1997), collocation ( Sinclair, 1991 and Scott, 2001) and colligation ( Hoey, 2000). Wilson (1997, p. 120) gives examples of lexical, syntactic and discoursal forms. Lexical studies include word use, idioms and irregular plurals. Sentence level features such as the use of prepositions, verb forms, pronouns and agreement are examples of syntactic studies. Lastly, discourse studies examine how texts are structured so that they are cohesive and coherent and include conjunctions, discourse markers and the `colligational properties' ( Hoey, 2000) of texts. Items that tend to be speaker or written lexical combinations could be studied through an examination of the collocational properties of lexical items. Such studies yield chunks of language that result from what Sinclair (1991) terms "the idiom principle", rather than the "open choice principle" whereby much of what is spoken and written comprises ready-made or prefabricated expressions.
Within a short space of time the students themselves were suggesting items to look for in the corpus. The students and teachers together discussed the ways in which the concordance lines might be analyzed. An example of this was a student who was interested in comparing the use and the frequency of occurrence of who and whom in ICE-GB.
The use of a concordancer and a corpus enabled us to look at the frequencies with which these items occurred across the spoken and written sections of ICE-GB and the total number of occurrences. Students were able to observe that who occurs far more frequently than whom (2195 instances as opposed to 87 instances) and that while whom was quite evenly spread across both the spoken and written texts, who was twice as likely to occur in the spoken half of ICE-GB than the written. Concordance lines also provide useful evidence of the collocational properties of the item under examination. The students noted that who collocates to the left with a noun group, while whom tends to be preceded by a preposition such as to, of or for that separates it from the noun group to which it refers. Similarly, the students also observed that the two items seem to possess different collocates to the right. For who, the pattern tends to be that it immediately precedes a main or auxiliary verb, while whom is typically immediately followed by a subject pronoun. Apart from collocations, the brief analysis conducted by the students in class also involved discussion of differences in meaning. The students found this an interesting way of looking at English language use, and the almost instantaneous display of requested items made it a responsive and user-friendly way of analyzing large amounts of authentic language.
After an intensive (12 h over a 2-week period) introduction to corpora, corpus linguistics and its applications, especially in Discourse Analysis, the students were asked to carry out an individual mini-research project (see Appendix for project details). It was this component that introduced students to data-driven language learning by placing them in the role of language researchers finding out for themselves about the English language rather than language learners looking to `experts' to answer their questions. Students were asked to base their projects on an aspect of English language use which they themselves found of interest and/or problematical. The students were required to look across at least two corpora to help confirm and/or compare their findings, and they had to compare their findings with what the existing literature, in the forms of dictionaries, grammars and so on, had to say about their particular area of study. The projects also provided a good means for us to determine the success or otherwise of combining these two subjects in order to place corpus linguistics firmly on the timetable as students were required to reflect on their experiences as language researchers and whether they felt they had benefited as language learners and English language majors from this form of data-driven learning.
The mini-research projects completed by the students cover a wide variety of topics in language structure and language use (see Table 1). Let us now look at the kinds of findings made by students in their mini-research projects. We need to bear in mind that these represent first attempts at corpus-driven research and, while the findings themselves might be limited, they demonstrate the different ways in which students approached the corpora and the paths of inquiry they undertook when interrogating the data. Some went no further than discussing frequency counts across gender, groups of speakers or genres and came up with interesting findings. Others took different paths through the data and ended up examining collocational patterns and exploring the idiom principle (Sinclair, 1991) at work. All of them, however, learned more about the English language and how it might be studied as a result of conducting their small-scale research projects. In the examples of students' findings given later, all of the frequencies are relative to the particular numbers being compared.
Table 1. Students' research areas
(18K)
One student's project explored the difference between after and afterwards and used the ICE GB for her study making use of the part of speech tagger. It was found that after was employed 20 times more often and was fairly evenly spread across the written and spoken corpora, whereas afterwards appeared just 55 times and was twice as likely to be used in spoken texts compared with written texts. It was observed that while afterwards always functions as an adverb, after can be used as a preposition, noun, adverb conjunction or part of a phrase. Also a collocation pattern was noted in the case of after (i.e. after all) when it was used as a connective.
One project looked at the frequency of use of the lemma great by men and women in the spoken and written sections of ICE-GB. It was found that men use great three times more often (i.e. 375 instances) than women in ICE-GB and that great is four times more likely to occur in spoken than written data. Great was found to usually be preceded by an indefinite article and followed by deal (i.e. a great deal) in 20% of the instances studied. This particular phrasal pattern was used 15 times by women and 35 times by men. The next most frequent pattern was the/a great big x (17% of all occurrences), a less frequent but observable pattern in the spoken corpus was the/a great thing about. The student looked at a number of dictionaries and books on English usage and found no reference to the kinds of findings she reported.
Shall was examined in one study which found it to be 33% more common in the spoken corpus of ICE-GB than the written corpus. The patterns shall I and shall we made up 37% of the total instances of collocates with pronouns in the spoken corpus but accounted for only 1% of the occurrences in the written corpus and this happened in written dialogues.
A study of however and but in written and spoken data found that but occurred eight times more frequently than however in both the written and spoken corpora of ICE-GB. However was 3.5 times more likely to occur in the written texts of ICE-GB (i.e. 425 times as opposed to 119 times), while for but the opposite pattern was observed with 63% of the occurrences in the spoken data. The student then went on to examine the position of the two lemmas in sentences/utterances and found that in spoken discourses both however and but were used almost equally at the start of a sentence/utterance, yet in written texts however is used 58% of the time at the start of a sentence while only 23% of the occurrences of but were utterance initial.
The usage of little in two very different text types––spoken discourse and academic writing––was the subject of another study. A number of collocations such as a little bit, a little bit like, a tiny little x were uncovered in the spoken data which were not found in the academic writing. In fact, 25% of the instances of little in the spoken corpus collocated with bit. The student then referred to the COBUILD Student's Dictionary (1994, p. 175) and used the information contained there (i.e. little is used as an adjective, quantifier or determiner) to provide a way of categorizing the instances of little. In the academic writing corpus, out of the 39 instances it was observed that little was used 38 times as a quantifier to emphasize a small amount of something. In the spoken corpus, 72% of the 238 occurrences of little were used as an adjective to modify the noun or attribute properties of `small size'.
Last was a study of anyway. It was found that anyway was more common in the spoken corpus of ICE-GB than the written corpus (249 versus 48 instances). It was also observed that in spoken data anyway often occurred utterance initial and this then led the student to explore its use as a discourse marker. Her findings agreed with other researchers' findings that anyway is commonly used to mark off the boundary of a digression and return to previous topic and can also introduce a new topic (Bublitz, 1988, p. 118). This student also found that anyway often collocates with but when used as a discourse marker and that this combination, but anyway, seems to act to reinforce the fact that a shift in topic is taking place.
An important part of the mini-research project engaged the students in reflecting on their learning experiences in relation to what they found most beneficial, what they found most difficult, and what difficulties they encountered in the process of conducting the mini-project. The assessment criteria clearly state that the students have to reflect on these aspects from both the perspectives of Discourse Analysis and Information Technology. This part of the assignment constituted 20% of the assessment. An analysis of the students' reflective responses shows that 48% of the students found the project very interesting and 33% found it interesting; only 19% found the project not very interesting. 59% of students found doing the project very useful and 28% useful; only 13% considered it not very useful. Finally, 76% of the students said that they had encountered difficulties in doing the mini-research project, and so presumably 24% of them encountered no difficulty.
On the whole, students had positive views about the project, and particularly regarding both the process and the outcome of corpus-driven language research. They agreed that corpora helped to transform the study of language from an environment which has been `evidence scarce' to one which is `evidence abundant' (Sinclair, 1997, p. 31). Some of their positive feedback is summarized later:
A few students compared the functions and uses of corpora and dictionaries. They pointed out that dictionaries may not provide as much information as a well-established corpus, and noted that corpora help the student/researcher to understand, check and complement the meanings of words provided by dictionaries. One student reflected, "corpus linguistics has revealed the inadequacies of traditional dictionaries and grammar books". Several students specifically mentioned ICE-GB, saying that the database of ICE-GB reveals the additional functions of words which are not described in a traditional dictionary, for example, after can act as both a noun and a connective.
Other students, however, focused on some of the perceived shortcomings of corpus-driven research. They have observed that existing corpora may not represent all language in use, as there are new uses every day, and some words (e.g. afterwards) could not be found in ICE-GB because the word is mainly used in American English. Many words could not be found in the corpora examined because the corpora were relatively small and the words have a relatively low frequency of occurrence. These points, while valid, do not argue against this approach to language study but rather underline the need for larger and more comprehensive corpora in order to better support corpus linguistics and data-driven learning.
Difficulties that the students encountered during the process of the corpus-driven language study can be grouped into three main categories, namely difficulties that appeared at the various stages of the mini-research project, difficulties that were specific to corpus-driven language research, and difficulties relating to the nature of the corpus data.
First of all, students were asked to break down the mini-research process into identifying research areas, reviewing literature, and formulating research hypotheses/questions for the language study. Some students noted that at the beginning of the project, it was difficult to decide on the words or phrases on which to perform a comparative study (with one of them stating that a preliminary search in the corpora could be useful as it might show that there was not much to explore), to find useful literature, and to set hypotheses. Concerning difficulties in choosing a research area at the start, some students were undecided as to whether to examine a lexical, syntactic, or discoursal feature; others were hesitant about whether to explore word classes, word meanings, or collocations of words. Sometimes, the difficulties arose from the literature, or some "irrelevant references", as a student put it, that students had consulted and on which they based the analyses. A different kind of difficulty was related to interpreting and discussing the data with a view to answering research questions, due to a lack of certain contextual information provided by the students' corpus-generated databases. One student, for instance, when analyzing each other and one another, had precise problems knowing the number of parties involved in the context since the corpus data did not provide this information. One student remarked that the corpus did not show "the context of situations and related contexts" which had made data analysis extremely problematic, and the kinds of contextual information that students found unavailable include relative status of participants, co-text, and degree of formality of the context of interaction. Also relating to context, a student who examined shall experienced difficulties in data interpretation and discussion because "shall has different meanings in different contexts", although this particular student did not say whether the different contexts were made known. Other students thought that the most difficult task involved the actual analysis, interpretation and discussion of findings. One student specifically stated that it was quite difficult to analyze syntactic features using Wordsmith Tools and ICE-CUP. Another student, who studied gender differences in the use of some words in ICE-GB, found it difficult to explain the result that there was no significant contrast between two genders. This student would, however, need to understand that different reasons may have contributed to the result; including, among others, his/her choice of words for investigation, hypotheses formulated, and the nature of the data compiled.
The second kind of difficulties was related to corpus-driven language study; and the difficulties included lack of knowledge and skills in choosing and using corpora and in using computer software like concordancers, and lack of enough data in the ICE-GB to enable confirming or refuting some hypotheses. Other problems were lack of prosodic features in the corpora studied, and limited query functions of the corpus.
The third type of difficulties concerned working on corpus data. Four main problems have been identified from the students' reflective writing. Obtaining and analyzing data from the corpora could be too laborious to carry out. The actual counting of frequencies of occurrence, classifying data into word classes, and working out percentages were tedious tasks to perform. Corpus-generated word lists were too overbearing, and so students found it hard to make useful sense of the collocations of references in texts from long frequency word lists. Some students were uncertain about how to classify examples obtained from the databases into useful categories. Despite encountering these difficulties in dealing with corpus data, the students were able to overcome them and completed their language research successfully.
As Tim Johns (1991, p. 2) points out, "the language-learner is also, essentially, a research worker whose learning needs to be driven by access to linguistic data". By using corpora of English texts, the BACEL students gained direct access to abundant examples of authentic language samples that could be studied and exploited in many ways. Through such mini-projects, the students carried out their research without knowing in advance what patterns they would discover. It helped them to get a "feel" for the linguistic features of their selected items by personally experiencing a focused study of that specific lexical, grammatical or discoursal unit as it appeared in the corpus data. Such an approach allowed the students to understand discourse analysis through the route of "research-then-theory" ( Johns, 1991, p. 30): they first started with a question, and came to conclusions only after analyzing the concordance output of the corpus data. These mini-projects have effectively "cut out the middleman as far as possible" and have given "the learner direct access to the data, the underlying assumption being that effective language learning is a form of linguistic research, and that the concordance printout offers a unique way of stimulating inductive learning strategies––in particular the strategies of perceiving similarities and differences of hypothesis formation and testing" ( Johns, 1991, p. 30).
Our experience has shown that it is possible and worthwhile, even in a fairly short time period, to introduce students to corpus linguistics through combining elements of existing, and we would imagine, standard subjects on an undergraduate English language programme. Feedback from both students and teachers suggest that this experience was both meaningful and valuable. The difficulties encountered by the students were largely a result of inexperience. Better training and more information would eliminate most of these in the future. We also found that such an addition to the timetable not only introduced students to an influential approach to language study, it also positively impacted the Information Technology and Discourse Analysis subjects by creating a bridge between them, serving to underline their relevance and importance to students engaged in language studies.
The work described in this paper was substantially supported by a learning and teaching development grant from the University Grants Council of the Hong Kong Special Administration Region (Project No. LTG98-99/DLTC/ENGL03e). Our thanks go to Ms Alice Lo for helping to collect and collate the data.
Altenberg, B. and Granger, S., 2002. Recent trends in cross-linguistic lexical studies. In: Altenberg, B. and Granger, S., Editors, 2002. Lexis in Contrast: Corpus-based Approaches, John Benjamins Publishing Company, Amsterdam; Philadelphia, pp. 3–48.
COBUILD Student's Dictionary, HarperCollins Publishers, London.
Bublitz, W., 1988. Supportive Fellow-speakers and Cooperative Conversations: Discourse Topics and Topical Actions, Participant Roles and "Recipient Action" in a Particular Type of Everyday Conversation. , John Benjamins Publishing Company, Amsterdam; Philadelphia.
Burnard, L. and McEnery, T., Editors, 2000. Rethinking Language Pedagogy from a Corpus Perspective, Peter Lang, Frankfurt.
Cheng, W. and Warren, M., 1999. Facilitating a description of intercultural conversations: the Hong Kong Corpus of Conversational English. ICAME Journal 23, pp. 5–20.
Cheng, W. and Warren, M., 2000. The Hong Kong Corpus of Spoken English: language learning through language description. In: Burnard, L. and McEnery, T., Editors, 2000. Rethinking Language Pedagogy from a Corpus Perspective, Peter Lang, Frankfurt, pp. 107–116.
Fan, M., Greaves, C. and Warren, M., 1999. Identifying characteristic patterns in students' writing using a corpus of learner data. In: Berry, R., Asker, B., Hyland, K. and Lam, M., Editors, 1999. Language Analysis, Description and Pedagogy, Language Centre, HKUST, Hong Kong, pp. 176–188.
Flowerdew, L., 1998. CALL materials derived from integrating `expert' and `interlanguage' corpora findings on causality: discoveries from teachers and students. English for Specific Purposes 17, pp. 329–346.
Ghadessy, M., Henry, A. and Roseberry, R.L., Editors, 2001. Small Corpus Studies and ELT: Theory and Practice, John Benjamins Publishing Company, Amsterdam; Philadelphia.
Granger, S., Editor, , 1998. Learner English on Computer, Longman, London; New York.
Hoey, M., 2000. The hidden lexical clues of textual organisation: a preliminary investigation into an unusual text from a corpus perspective. In: Burnard, L. and McEnery, T., Editors, 2000. Rethinking Language Pedagogy from a Corpus Perspective, Peter Lang, Frankfurt, pp. 31–41.
Hunston, S., 2002. Corpora in Applied Linguistics. , Cambridge University Press, Cambridge.
Johns, T., 1989. Whence and whither classroom concordancing?. In: Bongaerts, T., de Haan, P., Lobbe, S. and Wekker, H., Editors, 1989. Computer Applications in Language Learning, Foris, Dordrecht, pp. 9–33.
Johns, T., 1997. Contexts: the background, development and trailling of a concordance- based CALL program. In: Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, D., Editors, 1997. Teaching and Language Corpora, Longman, London and New York, pp. 100–115.
Johns, T., 1991. Should you be persuaded: two samples of data-driven learning materials. In: Johns, T. and King, P., Editors, 1991. Classroom Concordancing (English Language Research Journal 4), ELR, Birmingham, pp. 1–16.
Kettemann, B., 1995. On the use of concordancing in ELT. TELL & CALL 4, pp. 4–15.
Leech, G., 1992. Corpora and theories of linguistic performance. In: Svartvik, J., (Ed.), Directions in Corpus Linguistics: Proceedings of the Nobel Symposium 82, Stockholm, 4–8 August 1991, pp. 105–122.
McCarthy, M., 1998. Spoken Language and Applied Linguistics. , Cambridge University Press, Cambridge.
Scott, M., 1999. WordSmith Tools. , Oxford University Press, Oxford.
Scott, M., 2001. Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In: Ghadessy, M., Henry, A. and Roseberry, R., Editors, 2001. Small Corpus Studies and ELT: Theory and Practice, John Benjamins Publishing Company, Amsterdam; Philadelphia, pp. 47–67.
Simpson, R., Briggs, S., Ovens, J. and Swales, J., 2002. The Michigan Corpus of Academic Spoken English. , University of Michigan Press, Ann Arbor, MI.
Sinclair, J., Editor, , 1987. Looking Up: an Account of the COBUILD Project in Lexical Computing, Collins ELT, London and Glasgow.
Sinclair, J., 1991. Corpus, Concordance, Collocation. , Oxford University Press, Oxford.
Sinclair, J., 1997. Corpus evidence in language description. In: Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, D., Editors, 1997. Teaching and Language Corpora, Longman, London and New York, pp. 27–39.
Sinclair, J., 2001. The Preface. In: Ghadessy, M., Henry, A. and Roseberry, R., Editors, 2001. Small Corpus Studies and ELT: Theory and Practice, John Benjamins Publishing Company, Amsterdam; Philadelphia, pp. vii–xv.
Sinclair, J., 2001. Review. International Journal of Corpus Linguistics 6/2, pp. 339–359.
Sinclair, J., Jones, S., Daley, R., 1969. English Lexical Studies. University of Birmingham for the Office of Scientific and Technical information.
Thurstun, J. and Candlin, C., 1998. Concordancing and the teaching of the vocabulary of academic English. English for Specific Purposes 17, pp. 267–280. Abstract | PDF (697 K)
Tognini Bonelli, E., 2002. Functionally complete units of meaning across English and Italian: Towards a corpus-driven approach. In: Altenberg, B. and Granger, S., Editors, 2002. Lexis in Contrast: Corpus-based Approaches, John Benjamins Publishing Company, Amsterdam; Philadelphia, pp. 73–96.
Tribble, C., 1997. Improvising corpora for ELT. Quick and dirty ways of developing corpora for language teaching. In: Lewandowska-Tomaszczyk, B. and Melia, P., Editors, 1997. Palc '97: Practical Applications in Language Corpora, Lödz University Press, Lödz, pp. 106–118.
Tribble, C. and Jones, W., 1990. Concordances in the Classroom: a Resource Book for Teachers. , Longman, Harlow.
Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, D., Editors, 1997. Teaching and Language Corpora, Longman, London and New York.
Wilson, E., 1997. The Automatic Generation of CALL Exercises. In: Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, D., Editors, 1997. Teaching and Language Corpora, Longman, London and New York, pp. 116–130.
Conduct a corpus-based comparative study of one linguistic feature (lexical, syntactic, discourse). In not more than 1500 words, write a report on the study.
The report should consist of the following parts:
The report will be read and assessed separately by the teachers of Information Technology and Discourse Analysis.
|
|
||||||||||||||||||||||||||||||||||
Volume 31, Issue 2 , June 2003 , Pages 173-186 |
||||||||||||||||||||||||||||||||||
|
Send feedback to ScienceDirect
|