Tuesday, June 2, 2009

Corpus Legend

I began collecting student scripts in 1992 -- and by 1999, the JPU Corpus incorporated over 300 essays and research papers.

At the time, it was the largest European corpus of English as a Foreign Language.

My PhD study is based on this corpus. You can read it online at Google Books.

Three of the five subcorpora are available here and may be used for linguistic study.

I would love to hear from you. Please write me at my email address.

**

The following colleagues have written and talked about the corpus and my dissertation:

Aijmer, K. (2002). Modality in advanced Swedish learners' written interlanguage. In Granger, S., Hung, J. & Petch-Tyson, S. (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 55-76). Amsterdam: John Benjamins.

Bárdos, J. (2002). Könyvszemle [Book review]. Modern Nyelvoktatás, 8 (2-3), 98-99.

Beréndi, M. (2002). Recenzió [Book review]. Nyelvinfó, 10 (3-4), 60-61.

Crawford, W. J. (2008). Place and time adverbials in native and non-native English student writing. In A. Ädel & R. Reppen (Ed.), Corpora and discourse: The challenges of different settings. Amsterdam: John Benjamins Publishing Company.

Fekete, H. (2006). Towards the validation of translation as an intermediate language proficiency exam task. Unpublished PhD dissertation, ELTE, Budapest.

Foster, J., Wagner, J., & van Genabith, J. (2008, May). Using decision trees to detect and classify grammatical errors. Talk presented at the Calico '08 Workshop on Automatic Analysis of Learner Language: Bridging Foreign Language Teaching Needs and NLP Possibilities, University of San Francisco.

Gál, P. (2002). Az íráskészség fejlesztésének tartalmi vonatkozásai [The principles of developing the writing skill]. In J. Bárdos & I. Garaczi (Eds.), Nyelvpedagógia az ezredfordulón (pp. 317-340). Veszprém: Veszprémi Humán Tudományokért Alapítvány.

Gerdin, G. (2006). The use of the general nouns people and thing by L2 learners of English – A corpus-based study. Växjö universitet, Institutionen för humaniora Engelska.

Godó, Á. (2008). Cross-cultural aspects of academic writing: A study of Hungarian and North American college students’ L1 argumentative essays. International Journal of English Studies, 8 (2), 65-111.

Grigaliuniené, J., Bikeliené, L., & Jukneviciené, R. (2008). The Lithuanian component of the International Corpus of Learner English (LICLE): A resource for English language learning, teaching and research at Lithuanian institutions of higher education. Žmogus ir žodis, 10 (3), 62-70.

Guo, X. (2006). Verbs in the written English of Chinese learners: A corpus-based comparison between non-native speakers and native speakers. PhD thesis, University of Birmingham, Department of English.

Károly K. (2002). Answers about FL writing based on 'real' language data. Novelty, 9 (2), 60-62.

Katz, S. R. (2003, June). Peer response in the EFL academic writing cassroom: A critical analysis. Paper presented at the Second EATAW/EWCA Conference, Budapest.

Katz, S. R. (2004, April). Peer response in the EFL academic writing classroom in Eastern Europe: A critical analysis. Paper presented at the American Education Research Association Conference, San Diego.

Katz, S. R. (2005). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In Jen, B., Sheorey, R., Szilárd, S., & Titchmarsh, P. (Eds)., Contemporary Hungarian perspectives on linguistics, literature and pedagogy. Budapest, Hungary: Hungarian Academy of Sciences.

Katz, S. R. (2007). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In R. Sheorey & J. Kiss-Gulyás (Eds.), Studies in applied and theoretical linguistics. Debrecen: Kossuth Egyetemi Kiadó.

Kiszely, Z. (2006). Magyar középiskolások angol fogalmazási stratégiái [English composing skills of Hungarian high-school students]. Nyelvinfó, 14 (2), 8-12.

Kitao, K. (2006). Developing resources for corpus linguistics. Journal of Culture and Information Science, 1 (1), 17-36.

Kitsnik, M. (2006). Keelekorpused ja vöökeeleöpe [Language corpora and foreign language teaching]. Estonian Papers in Applied Linguistics 2, 93-108.

Lehmann, M. (2003). The lexis of writing and vocabulary size: The relationship between receptive knowledge and productive use. In J. Andor, J. Horváth & M. Nikolov (Eds.), Studies in English theoretical and applied linguistics (pp. 172-181). Pécs: Lingua Franca Csoport.

Mark, K. (2001). A parallel learner corpus: Using computers in a humanistic approach to language teaching and research. Japanese Journal of Language and Society, 4 (1), 5-16.

McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. New York: Routledge.

Merckle, T. (2008). Unpacking L2 writing responses: A corpus-based study on teacher feedback to student writing. Teaching English with Technology, 8 (4).

Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), Teaching and language corpora (pp. 125-152). Amsterdam: John Benjamins.

Oh, S. Y. (2007). A corpus-based study of epistemic modality in Korean college students’ writings in English. English Teaching, 62 (2), 147-176.

Ooi, V. (2002). From Shakespeare to Hungarian EFL writing: Using WWW corpora to motivate student learning. In M. Tan (Ed.), Corpus studies in language education (pp. 163-177). Bangkok: IELE Press.

Pravec, N. (2002). Survey of language learner corpora. ICAME Journal, 26, 81-114.

Recski, L. (2005). Utilizando corpora de aprendizes para a investigação de aspectos discursivos, metodologias de ensino e design de materiais pedagógicos [Using learner corpora to investigate discoursal aspects, teaching methodologies, and pedagogical materials]. Linguagem & Ensino, 8 (2), 249-273.

Richter, B. (Ed.). (2006). First steps in theoretical and applied linguistics. Budapest: Bölcsész Konzorcium.

Schiftner, B. (2008). Learner corpora of English and German: What is their status quo and where are they headed? Vienna English Working Papers, 17 (2), 47-78.

Spiri. J. (2008). Online study of frequency list vocabulary with the WordChamp website. Reflections on English Language Teaching, 7 (1), 21–36.>

Stritar, M. (2006). Oblikovanje korpusa usvajanja slovenščine kot tujega jezika [Slovene learner corpus design]. In T. Erjavec & J. Žganec Gros (Eds.), Proceedings of the 5th Slovenian and 1st International Language Technologies Conference. Ljubljana: Jožef Stefan Institute.

Szabó G. (2008). Applying item response theory in language test item bank building. Frankfurt am Main: Peter Lang.

Szirmai M. (2001). The theory and practice of corpus linguistics. Debrecen: Kossuth Egyetemi Kiadó.

Szirmai M. (2005). Bevezetés a korpusznyelvészetbe: A korpusznyelvészet alkalmazása az anyanyelv és az idegen nyelv tanulásában és tanításában [Introduction to corpus linguistics: The application of corpus linguistics in the study and teaching of the mother tongue and foreign languages]. Budapest: Tinta.

Tono, Y. (2003). Learner corpora: Design, development and applications. In Archer, D., Rayson, P., Wilson, A. & McEnery, T. (Eds.), Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16 (pp. 800-809). UCREL, Lancaster University.

Vermes A. P. (2003). Idegen nyelvi íráspedagógia és fordítás [Foreign-language writing pedagogy and translation]. Iskolakultúra, 13 (10), 58-60.

Wang, L., & Sun, K. (2005). Current developments in learner English corpus in and outside China. Computer-Assisted Foreign Language Education, 5, 19-24.

Xiao, Z. (2008) .Well-known and influential corpora. In A. Lüdeling & M. Kyto (Eds.), Corpus linguistics: An international handbook. Vol. 1. Berlin: Mouton de Gruyter. 383-457.