Thursday, June 12, 2008

The Best Corpus Project

Students in my Corpus Linguistics course have worked on some exciting projects. Here, I make available the best, by Kiss Ilona. First, an excerpt:

I started to develop my corpus by downloading articles on air travel from internet tourism magazines. As tourism is a rather wide issue I had to encompass a narrower scope and focused on travel by air. I assumed that analysing a reliable amount of data with the help of frequency ranges would help me with the selection of the most appropriate vocabulary items to be tested. The amount of data proved to be a crucial point of my investigation. I tried to find evidence to support my hypothesis that with a representative amount of data I would be able to select the most important and most frequent words of a specialised lexis in specialised texts. Thus I have collected twenty-six carefully selected professional articles from different websites comprising approximately thirteen thousand tokens.

You can get the full text, too.

The Language Practice Subcorpus

L 148 M, L 149 M, L 150 M, L 151 M, L 152 M, L 153 M

L 154 M, L 155 M, L 156 M, L 157 M, L 158 M, L 159 M

L 160 M, L 161 M, L 162 M, L 163 M, L 164 M, L 165 M

L 166 M, L 167 M, L 168 M, L 169 M, L 170 M, L 171 M

L 172 M, L 173 M, L 174 M, L 175 M, L 176 M, L 177 M

L 178 M, L 179 F, L 180 F, L 181 F, L 182 F, L 183 F

L 184 F, L 185 F, L 186 F, L 187 F, L 188 F, L 189 F

L 190 F,L 191 F, L 192 F, L 193 F, L 194 F, L 195 F

L 196 F, L 197 F, L 198 F, L 199 F, L 200 F, L 201 F

L 202 F, L 203 F, L 204 F, L 205 F, L 206 F, L 207 F

L 208 F, L 209 F, L 210 F, L 211 F, L 212 F, L 213 F

L 214 F, L 215 F, L 216 F, L 217 F, L 218 F, L 219 F

L 220 F, L 221 F

The Russian Retraining Subcorpus

R 131 F, R 132 F, R 133 F, R 134 F, R 135 F, R 136 F

R 137 F, R 138 F, R 139 F, R 140 F, R 141 F, R 142 F

R 143 F, R 144 F, R 145 F, R 146 F, R 147 M

The Writing Subcorpus

W 001 F, W 002 F, W 003 F, W 004 F, W 005 F, W 006 F

W 007 F, W 008 F, W 009 F, W 010 F, W 011 F, W 012 F

W 013 F, W 014 F, W 015 F, W 016 F, W 017 F, W 018 F

W 019 F, W 020 F, W 021 Fm W 022 Fm W 023 F, W 024 F

W 025 F, W 026 F, W 027 F, W 028 F, W 029 F, W 030 F

W 031 F, W 032 F, W 033 F, W 034 F, W 035 F, W 036 F

W 037 F, W 038 F, W 039 F, W 040 F, W 041 F, W 042 F

W 043 F, W 044 F, W 045 F, W 046 F, W 047 F, W 048 F

W 049 F, W 050 F, W 051 F, W 052 F, W 053 F, W 054 F

W 055 F, W 056 F, W 057 F, W 058 F, W 059 F, W 060 F

W 061 F, W 062 F, W 063 F, W 064 F, W 065 F, W 066 F

W 067 F, W 068 F, W 069 F, W 070 F, W 071 F, W 072 F

W 073 F, W 074 F, W 075 F, W 076 F, W 077 F, W 078 F

W 079 F, W 080 F, W 081 F, W 082 F, W 083 F, W 084 F

W 085 F, W 086 F, W 087 F, W 088 F, W 089 F, W 090 F

W 091 F, W 092 F, W 093 F, W 094 F, W 095 F, W 096 F

W 097 F, W 098 F, W 099 F, W 100 F, W 101 F, W 102 F

W 103 F, W 104 F, W 105 F, W 106 F, W 107 M, W 108 M

W 109 M, W 110 M, W 111 M, W 112 M, W 113 M, W 114 M

W 115 M, W 116 M, W 117 M, W 118 M, W 119 M, W 120 M

W 121 M, W 122 M, W 123 M, W 124 M, W 125 M, W 126 M

W 127 M, W 128 M, W 129 M, W 130 M

Tuesday, April 1, 2008

JPU Corpus Gem

Every now and then, I recommend a script from the corpus.

This time, it's L 216 F, an essay about travelling with lobsters.

Thursday, January 17, 2008

JPU Corpus News

January 2008: My paper on the internet version of the JPU Corpus and its application with the Compleat Lexical Tutor was published in When Grammar Minds Language and Literature by the Institute of English and American Studies, University of Debrecen. (A PDF version of the paper is availabe.) Editors: József Andor, Béla Hollósy, Tibor Laczkó and Péter Pelyvás. The introduction: "At the end of the nineties, the first phase of a corpus-linguistic project involving over three hundred English majors at the university then known as Janus Pannonius came to an end: I had collected one of the largest European learner corpora, the JPU Corpus. It contained almost half a million words, which made it possible to describe and analyze the personal and academic writing of these students and to profile individual and group differences. Among other areas, it provided the raw data for analyzing correct and incorrect uses of the definite article in the essays of colleagues enrolled in the Russian retraining program and the opening and closing sentences in the subcorpus comprising one hundred research papers."

December 2007: The Hungarian applied-linguistics journal, Modern Nyelvoktatás, published my article about using the JPU corpus for various language study tasks. One of these tasks links the corpus with the color-coded frequency bands of the BNC, as part of the Frequency tools of the Compleat Lexical Tutor. Examples are provided in the article, using this excerpt from the JPU script L 220 F: "Graham Greene is the kind of writer whose novels and short stories are influenced by his own experiences in life. He does not write directly about his life but his attitude to the phenomena of the world and the things that happen to him can be felt in the ways he makes his stories. The role of childhood experiences, the unpleasant side of life and escapism are important aspects in Greene's life. This essay will examine how and why he deals with them in his works too."

June 2007: Tom Cobb added the JPU Corpus to his Compleat Lexical Tutor suite of tools and sources. You can get concordance citations online.