Thursday, December 12, 2013

Podcast about an essay from the JPU corpus

{\rtf1\ansi\ansicpg1252 {\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} {\colortbl;\red255\green255\blue255;\red85\green85\blue85;\red238\green238\blue238;} \deftab720 \pard\pardeftab720\sl400\partightenfactor0 \f0\fs24 \cf2 \cb3 \expnd0\expndtw0\kerning0 \outl0\strokewidth0 \strokec2 }

Sunday, September 22, 2013

JPU Corpus blog logs 14,000 page views

Since its launch in 2007, this JPU Corpus blog has been viewed over 14,000 times. No earth-shattering stats, those, but I am still impressed by the dedication of linguists and other colleagues from around the world, visiting the site.

A run-down of the top ten countries:

Wednesday, June 26, 2013

This Month's JPU Gem

From time to time, I feature an essay from the corpus. Way back in the 1990s, this is what a student wrote about virtual reality.


Corpus Legend

I began collecting student scripts in 1992 -- and by 1999, the JPU Corpus incorporated over 300 essays and research papers.

At the time, it was the largest European corpus of English as a Foreign Language.

My PhD study is based on this corpus. You can read it online at Google Books.

Three of the five subcorpora are available here and may be used for linguistic study.

I would love to hear from you. Please write me at my email address.


Besides colleagues in Hungary, scholars from Austria, Brazil, Canada, China, Estonia, Germany, Japan, Lithuania, Singapore, Slovenia, South Korea, Sweden, Taiwan, Thailand, the UK and the US have written and talked about the JPU corpus and my dissertation:

Aijmer, K. (2002). Modality in advanced Swedish learners' written interlanguage. In Granger, S., Hung, J. & Petch-Tyson, S. (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 55-76). Amsterdam: John Benjamins.

Bárdos, J. (2002). Könyvszemle [Book review]. Modern Nyelvoktatás, 8 (2-3), 98-99.

Beréndi, M. (2002). Recenzió [Book review]. Nyelvinfó, 10 (3-4), 60-61.

Crawford, W. J. (2008). Place and time adverbials in native and non-native English student writing. In A. Ädel & R. Reppen (Ed.), Corpora and discourse: The challenges of different settings (pp. 267-287). Amsterdam: John Benjamins Publishing Company.

Deák, J. (2009, October). Corpus analysis of verb phrase constructions in academic English writing by NSs and NNSs. Poster presented at the Second Language Research Forum, Michigan State University.

Doró, K. (2010). Meeting the language barrier:The experience of first-year students of English. In I. Hegedűs & S. Martsa (Eds.), CrosSections: Selected papers in linguistics from the 9th HUSSE conference (Vol. 1, pp. 289-297). Pécs: Institute of English Studies, University of Pécs.

Doró, K. (2011). Students' perceptions about their preparedness for undergraduate studies in English. In  UPRT 2009: Empirical studies in English applied linguistics (pp. 81-92). Pécs: Lingua Franca Csoport.

Doró, K. (2013). Selling their research: The linguistic realization of rhetoric moves in English thesis abstracts written by Hungarian undergraduates. Romanian Journal of English Studies, 9 (2), 181-191. 

Ene, S. E. (2006). The last stages of second language acquisition: Linguistics evidence from academic writing by advanced non-native English speaker. PhD dissertation, University of Arizona. Microform edition by ProQuest Information and Learning Company.

Fekete, H. (2006). Towards the validation of translation as an intermediate language proficiency exam task. Unpublished PhD dissertation, ELTE, Budapest.

Foss, P. (2009a). Constructing a blog corpus for Japanese learners of English. JALT CALL Journal, 5 (1), 65-76.

Foss, P. (2009b). Written learner corpora. Kwansei Gakuin University Humanities Review, 12, 147-158.

Foster, J., Wagner, J., & van Genabith, J. (2008, May). Using decision trees to detect and classify grammatical errors. Talk presented at the Calico '08 Workshop on Automatic Analysis of Learner Language: Bridging Foreign Language Teaching Needs and NLP Possibilities, University of San Francisco.

Gál, P. (2002). Az íráskészség fejlesztésének tartalmi vonatkozásai [The principles of developing the writing skill]. In J. Bárdos & I. Garaczi (Eds.), Nyelvpedagógia az ezredfordulón (pp. 317-340). Veszprém: Veszprémi Humán Tudományokért Alapítvány.

Gerdin, G. (2006). The use of the general nouns people and thing by L2 learners of English – A corpus-based study. Växjö universitet, Institutionen för humaniora Engelska.

Godó, Á. (2008). Cross-cultural aspects of academic writing: A study of Hungarian and North American college students’ L1 argumentative essays. International Journal of English Studies, 8 (2), 65-111.

Grigaliuniené, J., Bikeliené, L., & Jukneviciené, R. (2008). The Lithuanian component of the International Corpus of Learner English (LICLE): A resource for English language learning, teaching and research at Lithuanian institutions of higher education. Žmogus ir žodis, 10 (3), 62-70.

Guo, X. (2006). Verbs in the written English of Chinese learners: A corpus-based comparison between non-native speakers and native speakers. PhD thesis, University of Birmingham, Department of English.

Jendryczka-Wierszycka, J. (2009, July). Collecting spoken learner data: Challenges and benefits -- A Polish L1 perspective. In M. Michaela, V. González-Díaz, & C. Smith (Eds.), Proceedings of the corpus linguistic conference. University of Liverpool.

Károly K. (2002). Answers about FL writing based on 'real' language data. Novelty, 9 (2), 60-62.

Katz, S. R. (2003, June). Peer response in the EFL academic writing cassroom: A critical analysis. Paper presented at the Second EATAW/EWCA Conference, Budapest.

Katz, S. R. (2004, April). Peer response in the EFL academic writing classroom in Eastern Europe: A critical analysis. Paper presented at the American Education Research Association Conference, San Diego.

Katz, S. R. (2005). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In Jen, B., Sheorey, R., Szilárd, S., & Titchmarsh, P. (Eds)., Contemporary Hungarian perspectives on linguistics, literature and pedagogy. Budapest, Hungary: Hungarian Academy of Sciences.

Katz, S. R. (2007). Peer response in the EFL academic writing classroom in Hungary: A critical analysis. In R. Sheorey & J. Kiss-Gulyás (Eds.), Studies in applied and theoretical linguistics. Debrecen: Kossuth Egyetemi Kiadó.

Kiszely, Z. (2006). Magyar középiskolások angol fogalmazási stratégiái [English composing skills of Hungarian high-school students]. Nyelvinfó, 14 (2), 8-12.

Kitao, K. (2006). Developing resources for corpus linguistics. Journal of Culture and Information Science, 1 (1), 17-36.

Kitsnik, M. (2006). Keelekorpused ja vöökeeleöpe [Language corpora and foreign language teaching]. Estonian Papers in Applied Linguistics 2, 93-108.

Lehmann, M. (2003). The lexis of writing and vocabulary size: The relationship between receptive knowledge and productive use. In J. Andor, J. Horváth & M. Nikolov (Eds.), Studies in English theoretical and applied linguistics (pp. 172-181). Pécs: Lingua Franca Csoport.

Lehmann, M. (2013). The use of lexical bundles in EFL academic writing tasks. In J. Mihaljevic-Djigunovic & M. M. Krajnovic (Eds.), UZRT 2012: Empirical studies in English applied linguistics (pp. 131-141). Zagreb: FF Press.

Lukácsi, Z. (2013). Cohesion and writing quality: Exploring the construct of cohesion in Euro examinations. PhD dissertation, University of Pécs.

Lüdeling, A. & Kytö, M. (Eds.). (2009). Corpus linguistics: An international handbook. Berlin: Walter de Gruyter.

Magnuczné Godó, Á. (2009). Gondolatok a kontrasztív retorika változó perspektíváiról. Alkalmazott Nyelvészeti Közlemények, 4 (1), 167-182.

Mark, K. (2001). A parallel learner corpus: Using computers in a humanistic approach to language teaching and research. Japanese Journal of Language and Society, 4 (1), 5-16.

McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. New York: Routledge.

Merckle, T. (2008). Unpacking L2 writing responses: A corpus-based study on teacher feedback to student writing. Teaching English with Technology, 8 (4).

Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), Teaching and language corpora (pp. 125-152). Amsterdam: John Benjamins.

Oh, S. Y. (2007). A corpus-based study of epistemic modality in Korean college students’ writings in English. English Teaching, 62 (2), 147-176.

Ooi, V. (2002). From Shakespeare to Hungarian EFL writing: Using WWW corpora to motivate student learning. In M. Tan (Ed.), Corpus studies in language education (pp. 163-177). Bangkok: IELE Press.

Pravec, N. (2002). Survey of language learner corpora. ICAME Journal, 26, 81-114.

Recski, L. (2005). Utilizando corpora de aprendizes para a investigação de aspectos discursivos, metodologias de ensino e design de materiais pedagógicos [Using learner corpora to investigate discoursal aspects, teaching methodologies, and pedagogical materials]. Linguagem & Ensino, 8 (2), 249-273.

Richter, B. (Ed.). (2006). First steps in theoretical and applied linguistics. Budapest: Bölcsész Konzorcium.

Shaw, P. (2004). A longitudinal corpus of Swedish university students’ written English, some
problems and some results. Lund Working Papers in English, 81-92.

Schiftner, B. (2008). Learner corpora of English and German: What is their status quo and where are they headed? Vienna English Working Papers, 17 (2), 47-78.

Spiri. J. (2008). Online study of frequency list vocabulary with the WordChamp website. Reflections on English Language Teaching, 7 (1), 21–36.>

Stritar, M. (2006). Oblikovanje korpusa usvajanja slovenščine kot tujega jezika [Slovene learner corpus design]. In T. Erjavec & J. Žganec Gros (Eds.), Proceedings of the 5th Slovenian and 1st International Language Technologies Conference. Ljubljana: Jožef Stefan Institute.

Szabó G. (2008). Applying item response theory in language test item bank building. Frankfurt am Main: Peter Lang.

Szirmai M. (2001). The theory and practice of corpus linguistics. Debrecen: Kossuth Egyetemi Kiadó.

Szirmai M. (2005). Bevezetés a korpusznyelvészetbe: A korpusznyelvészet alkalmazása az anyanyelv és az idegen nyelv tanulásában és tanításában [Introduction to corpus linguistics: The application of corpus linguistics in the study and teaching of the mother tongue and foreign languages]. Budapest: Tinta.

Tasanameelarp, A. (2010). Effects of using concordancing on EFL learners' ability to self-correct grammatical errors. MA thesis, Prince of Songkla Universiry, Thailand.

Tono, Y. (2003). Learner corpora: Design, development and applications. In Archer, D., Rayson, P., Wilson, A. & McEnery, T. (Eds.), Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16 (pp. 800-809). UCREL, Lancaster University.

Tono, Y. (2005). Corpus-based SLA research: State of the art of learner corpus studies. In M. Minami, H. Kobayashi, M. Nakayama & H. Sirai (Eds.), Studies in language sciences: Papers from the Fourth Annual Conference of the Japanese Society of Language Sciences (pp. 45-77) Tokyo: Kurosio Publishers.

Vermes A. P. (2003). Idegen nyelvi íráspedagógia és fordítás [Foreign-language writing pedagogy and translation]. Iskolakultúra, 13 (10), 58-60.

Wang, L., & Sun, K. (2005). Current developments in learner English corpus in and outside China. Computer-Assisted Foreign Language Education, 5, 19-24.

Xiao, Z. (2008). Well-known and influential corpora. In A. Lüdeling & M. Kyto (Eds.), Corpus linguistics: An international handbook. Vol. 1. Berlin: Mouton de Gruyter. 383-457.

Thursday, June 12, 2008

The Best Corpus Project

Students in my Corpus Linguistics course have worked on some exciting projects. Here, I make available the best, by Kiss Ilona. First, an excerpt:

I started to develop my corpus by downloading articles on air travel from internet tourism magazines. As tourism is a rather wide issue I had to encompass a narrower scope and focused on travel by air. I assumed that analysing a reliable amount of data with the help of frequency ranges would help me with the selection of the most appropriate vocabulary items to be tested. The amount of data proved to be a crucial point of my investigation. I tried to find evidence to support my hypothesis that with a representative amount of data I would be able to select the most important and most frequent words of a specialised lexis in specialised texts. Thus I have collected twenty-six carefully selected professional articles from different websites comprising approximately thirteen thousand tokens.

You can get the full text, too.