The JPU Corpus
221 essays and research papers from my collection of Hungarian students' writing in English. Each script appears as a separate entry. W, R and L stand for the subcorpora: Writing, Retraining, and Language practice. F stands for female, M for male authors. Scripts also have labels to allow for advanced search. To carry out online concordance search, please visit The Compleat Lexical Tutor site.
Thursday, December 12, 2013
Podcast about an essay from the JPU corpus
Sunday, September 22, 2013
JPU Corpus blog logs 14,000 page views
A run-down of the top ten countries:
Wednesday, June 26, 2013
This Month's JPU Gem
Enjoy!
Corpus Legend
At the time, it was the largest European corpus of English as a Foreign Language.
My PhD study is based on this corpus. You can read it online at Google Books.
Three of the five subcorpora are available here and may be used for linguistic study.
I would love to hear from you. Please write me at my email address.
**
Besides colleagues in Hungary, scholars from Austria, Brazil, Canada, China, Estonia, Germany, Japan, Lithuania, Singapore, Slovenia, South Korea, Sweden, Taiwan, Thailand, the UK and the US have written and talked about the JPU corpus and my dissertation:
Aijmer, K. (2002). Modality in advanced Swedish learners' written interlanguage. In Granger, S., Hung, J. & Petch-Tyson, S. (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 55-76). Amsterdam: John Benjamins.
Bárdos, J. (2002). Könyvszemle [Book review]. Modern Nyelvoktatás, 8 (2-3), 98-99.
Beréndi, M. (2002). Recenzió [Book review]. Nyelvinfó, 10 (3-4), 60-61.
Crawford, W. J. (2008). Place and time adverbials in native and non-native English student writing. In A. Ädel & R. Reppen (Ed.), Corpora and discourse: The challenges of different settings (pp. 267-287). Amsterdam: John Benjamins Publishing Company.
Doró, K. (2011). Students' perceptions about their preparedness for undergraduate studies in English. In UPRT 2009: Empirical studies in English applied linguistics (pp. 81-92). Pécs: Lingua Franca Csoport.
Doró, K. (2013). Selling their research: The linguistic realization of rhetoric moves in English thesis abstracts written by Hungarian undergraduates. Romanian Journal of English Studies, 9 (2), 181-191.
Gál, P. (2002). Az íráskészség fejlesztésének tartalmi vonatkozásai [The principles of developing the writing skill]. In J. Bárdos & I. Garaczi (Eds.), Nyelvpedagógia az ezredfordulón (pp. 317-340). Veszprém: Veszprémi Humán Tudományokért Alapítvány.
Gerdin, G. (2006). The use of the general nouns people and thing by L2 learners of English – A corpus-based study. Växjö universitet, Institutionen för humaniora Engelska.
Godó, Á. (2008). Cross-cultural aspects of academic writing: A study of Hungarian and North American college students’ L1 argumentative essays. International Journal of English Studies, 8 (2), 65-111.
Grigaliuniené, J., Bikeliené, L., & Jukneviciené, R. (2008). The Lithuanian component of the International Corpus of Learner English (LICLE): A resource for English language learning, teaching and research at Lithuanian institutions of higher education. Žmogus ir žodis, 10 (3), 62-70.
Guo, X. (2006). Verbs in the written English of Chinese learners: A corpus-based comparison between non-native speakers and native speakers. PhD thesis, University of Birmingham, Department of English.
Károly K. (2002). Answers about FL writing based on 'real' language data. Novelty, 9 (2), 60-62.
Katz, S. R. (2003, June). Peer response in the EFL academic writing cassroom: A critical analysis. Paper presented at the Second EATAW/EWCA Conference, Budapest.
Katz, S. R. (2004, April). Peer response in the EFL academic writing classroom in Eastern Europe: A critical analysis. Paper presented at the American Education Research Association Conference, San Diego.
Katz, S. R. (2005). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In Jen, B., Sheorey, R., Szilárd, S., & Titchmarsh, P. (Eds)., Contemporary Hungarian perspectives on linguistics, literature and pedagogy. Budapest, Hungary: Hungarian Academy of Sciences.
Katz, S. R. (2007). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In R. Sheorey & J. Kiss-Gulyás (Eds.), Studies in applied and theoretical linguistics. Debrecen: Kossuth Egyetemi Kiadó.
Kiszely, Z. (2006). Magyar középiskolások angol fogalmazási stratégiái [English composing skills of Hungarian high-school students]. Nyelvinfó, 14 (2), 8-12.
Kitao, K. (2006). Developing resources for corpus linguistics. Journal of Culture and Information Science, 1 (1), 17-36.
Kitsnik, M. (2006). Keelekorpused ja vöökeeleöpe [Language corpora and foreign language teaching]. Estonian Papers in Applied Linguistics 2, 93-108.
Lehmann, M. (2003). The lexis of writing and vocabulary size: The relationship between receptive knowledge and productive use. In J. Andor, J. Horváth & M. Nikolov (Eds.), Studies in English theoretical and applied linguistics (pp. 172-181). Pécs: Lingua Franca Csoport.
Lehmann, M. (2013). The use of lexical bundles in EFL academic writing tasks. In J. Mihaljevic-Djigunovic & M. M. Krajnovic (Eds.), UZRT 2012: Empirical studies in English applied linguistics (pp. 131-141). Zagreb: FF Press.
Lukácsi, Z. (2013). Cohesion and writing quality: Exploring the construct of cohesion in Euro examinations. PhD dissertation, University of Pécs.
Magnuczné Godó, Á. (2009). Gondolatok a kontrasztív retorika változó perspektíváiról. Alkalmazott Nyelvészeti Közlemények, 4 (1), 167-182.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. New York: Routledge.
Merckle, T. (2008). Unpacking L2 writing responses: A corpus-based study on teacher feedback to student writing. Teaching English with Technology, 8 (4).
Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), Teaching and language corpora (pp. 125-152). Amsterdam: John Benjamins.
Oh, S. Y. (2007). A corpus-based study of epistemic modality in Korean college students’ writings in English. English Teaching, 62 (2), 147-176.
Ooi, V. (2002). From Shakespeare to Hungarian EFL writing: Using WWW corpora to motivate student learning. In M. Tan (Ed.), Corpus studies in language education (pp. 163-177). Bangkok: IELE Press.
Pravec, N. (2002). Survey of language learner corpora. ICAME Journal, 26, 81-114.
Recski, L. (2005). Utilizando corpora de aprendizes para a investigação de aspectos discursivos, metodologias de ensino e design de materiais pedagógicos [Using learner corpora to investigate discoursal aspects, teaching methodologies, and pedagogical materials]. Linguagem & Ensino, 8 (2), 249-273.
Richter, B. (Ed.). (2006). First steps in theoretical and applied linguistics. Budapest: Bölcsész Konzorcium.
Shaw, P. (2004). A longitudinal corpus of Swedish university students’ written English, some
problems and some results. Lund Working Papers in English, 81-92.
Schiftner, B. (2008). Learner corpora of English and German: What is their status quo and where are they headed? Vienna English Working Papers, 17 (2), 47-78.
Spiri. J. (2008). Online study of frequency list vocabulary with the WordChamp website. Reflections on English Language Teaching, 7 (1), 21–36.>
Stritar, M. (2006). Oblikovanje korpusa usvajanja slovenščine kot tujega jezika [Slovene learner corpus design]. In T. Erjavec & J. Žganec Gros (Eds.), Proceedings of the 5th Slovenian and 1st International Language Technologies Conference. Ljubljana: Jožef Stefan Institute.
Szabó G. (2008). Applying item response theory in language test item bank building. Frankfurt am Main: Peter Lang.
Szirmai M. (2001). The theory and practice of corpus linguistics. Debrecen: Kossuth Egyetemi Kiadó.
Szirmai M. (2005). Bevezetés a korpusznyelvészetbe: A korpusznyelvészet alkalmazása az anyanyelv és az idegen nyelv tanulásában és tanításában [Introduction to corpus linguistics: The application of corpus linguistics in the study and teaching of the mother tongue and foreign languages]. Budapest: Tinta.
Tono, Y. (2003). Learner corpora: Design, development and applications. In Archer, D., Rayson, P., Wilson, A. & McEnery, T. (Eds.), Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16 (pp. 800-809). UCREL, Lancaster University.
Vermes A. P. (2003). Idegen nyelvi íráspedagógia és fordítás [Foreign-language writing pedagogy and translation]. Iskolakultúra, 13 (10), 58-60.
Wang, L., & Sun, K. (2005). Current developments in learner English corpus in and outside China. Computer-Assisted Foreign Language Education, 5, 19-24.
Xiao, Z. (2008). Well-known and influential corpora. In A. Lüdeling & M. Kyto (Eds.), Corpus linguistics: An international handbook. Vol. 1. Berlin: Mouton de Gruyter. 383-457.
Thursday, June 12, 2008
The Best Corpus Project
I started to develop my corpus by downloading articles on air travel from internet tourism magazines. As tourism is a rather wide issue I had to encompass a narrower scope and focused on travel by air. I assumed that analysing a reliable amount of data with the help of frequency ranges would help me with the selection of the most appropriate vocabulary items to be tested. The amount of data proved to be a crucial point of my investigation. I tried to find evidence to support my hypothesis that with a representative amount of data I would be able to select the most important and most frequent words of a specialised lexis in specialised texts. Thus I have collected twenty-six carefully selected professional articles from different websites comprising approximately thirteen thousand tokens.