Wednesday, May 9, 2007

W 123 M

Introduction

At the English Department of Janus Pannonius University I and other 22 English major students chose a course in the first semester of 1997/1998. This course, under the code ANG 5131, is Language Testing conducted by Szabó Gábor. I chose this course because I have always been interested in both making and taking English languages tests of any kind since I started learning English. This seminar threw light on the fact that test making is far more complicated and difficult than I thought. Test makers have to take several aspects into account: how the test should look like, on what level the test should be designed, what they want to test- which field of candidates’ language knowledge, or even how they should avoid ambiguous, too easy or too difficult test questions etc..

As a part of the requirements for completing the course successfully we were asked to make an English language test (see Gertner & Schlögl, 1997); to design and pilot it with a group of students. Each of the students in this course had to choose a partner with whom they would do this task. Gertner Brigitta, another 1st year student, and I decided to work on this project together.

This research paper will report the preparations for administering the test and the evaluation and analysis of the test results we got after receiving the completed test papers back. In this paper I will show the whole process of piloting a language test from the point where the first draft of the test is ready. Besides, I will represent the analysis and evaluation of the test we administered in a highschool, trying to point out some facts in connection with the knowledge of these students; their strengths and weaknesses in different fields.

Method
In the first part of the research paper I will describe the steps we had taken before giving the test to the students; why we consulted Horváth József, how we revised certain parts of the test with the help of his comments. I will also write about how we administered this test; where, when and how- who were the test-takers and whom we asked to help us in this process.

The second part is the evaluation and analysis of the test according to the results we got. I will make both an overall and item analysis of the test by using methods and terms present in test analysis. At the end of the test I will give you a concise evaluation of the test according to the scores the students received. I will also enclose the whole test paper with this research paper(see Appendix page 12-14).

Results and Discussion

Prepiloting, Revising and Piloting

In this section I will briefly summarise what happened between making the first draft of the test and administering it.

After Gertner Brigitta and I had completed the construction of the test we gave it to Horváth, an English teacher at JPU English Department, and asked him to look through this test and point out those test items he thought to be inappropriate.

In Part 1/A the fourth item ”He was ill ……… the flu for almost a month.“ was replaced with “The match resulted …….. a goaless draw yesterday.” because Horváth found it unsuitable. In Part 1/B we did not have to alter anything; each item seemed correct and appropriate. In Part 2 three items out of the ten were replaced and three more were altered.

Instead of “Sorry to …….. you, but would you be so kind as to give me a pen?
a. tease b. bother
c. disrupt d. disturb“
item #1 was put into the test.
Instead of “The ………. of his death are not clear, at all.
a. conditions b. circumstances
c. reasons d. details“
item #3 was put into the test.
Instead of “According to the police no sign of ……….entry has been found in the house yet.”
a. forced b. compelled
c. broken d. illegal“
item #10 was put into the test.

In item #5 we used apostrophes instead of dashes, in item #8 we omitted “to you” at the end of the sentence in order to avoid clutter, and in item #9 we replaced option-b “replace” with “concord” because, according to Horváth, it would have been another option, too. Part 3 remained the same; we did not need to change anything.

After we had revised the test this way I typed it into the computer and, with the help of Gertner, edited it. Before printing the document we took a final look at the test paper whether correct and explicit in every respect.

Meanwhile, we decided to pilot this test, if possible, in the 4.A class of Jurisich Miklós Kísérleti Gimnázium in Kõszeg. We chose this class partly because they were supposed to be on the same level in English knowledge which we wanted to test and because I know the English teacher, Varga Katalin, who has been teaching them and I could ask her to pilot our test with them. The administration of the test took place on the 26th of November, 1997 in this highschool with the help of their teacher. Thirteen students took the test from the group, which have been studying English in more lessons than other classes in this school.

Test Analysis

In this test the maximum point was 45; 1 point for each item in Part 1/A and in Part 2, 2 points for each item in Part 1/B, and 0,5 point in Part 3. The students reached the following results: 21 22 22,5 23 23,5 24 24,5 25 26,5 27 35,5 36 37. .

Overall Analysis

In this part of the paper I will give a general review of the test results using some methods and terms known in testing practice.

The mean, the average of the students’ scores: 26,8 points.

The mode, the score that most students got: cannot be determined- everyone reached different points.

The median, the midpoint of the scores: 24,5 points.

Standard deviation, average deviation from the mean: 4,42.

Part 1/A
Average score: 5 points out of 10
The highest score: 9 points
The lowest score: 2 points

Part 1/B
Average score: 7 points out of 10
The highest score: 9 points
The lowest score: 5 points

Part 2
Average points: 6,5 points out of 10
The highest score: 9 points
The lowest score: 4 points

Part 3
Average point: 8 points out of 15
The highest score: 12 points
The lowest score: 4 points

Item Analysis

Here I will represent and discuss the results from item to item and point out facts about which item; how they worked in this test paper. I focus on those items that were too easy or too difficult for this group of students, and on those which came out with a surprising result

Part 1/A. In this section of the test the students had to fill in the missing prepositions in the sentences. The ratio of the correct and wrong answers was quite mixed (see Figure 1).

Though we had thought that the first item would be quite easy, it turned out that almost none of the students knew it; they usually wrote “on” instead of “in” in the expression ”in charge of something”. In item #3, to my surprise, no one put “with”, which would have been the most obvious choice if we take the meaning of the verb ”boast”. In item #5 almost everyone chose “out” for the solution, though it was “in” or “into”. In items #9 and #10 we can see a huge contrast; in question #9 everyone knew the right answer, which was “out of”. This means that this item was too easy for this group. But in question #10 no one knew the answer; testees usually wrote “for” and “of” instead of “about”, which means that together with the question #1 these two items were unknown for these students.

The remaining seven items were appropriate for this purpose as the items were really able to select those who knew them and those who did not.

Figure 1: Number of right and wrong answers in Part 1/A.

Part 1/B. In this part the students were asked to paraphrase five simple sentences by completing the sentences given below them. This was the easiest task for them; they were supposed to learn more grammar in this school than, for instance, expressions or words. The first item was very easy for them since they all knew it properly. In the second sentence 7 out of the 13 students knew the right structure; the rest used simple past instead of the required form of the past perfect. In question #3 most of the student got only the half of the points because they put the pronoun “it” in the sentence, which was incorrect there; four 4 of them knew it, and one student wrote a totally incorrect version. Question #4 was an other easy sentence paraphrasing task; 11 put the right construction, one solved the problem partly, and one person did not know the answer at all. The last bit was the most problematic in this exercise as only 6 out of 13 were able to paraphrase the sentence. Quite surprisingly, mainly those could answer this item who got the lowest point in the whole test.

Part 2 was a multiple choice test with 10 items. Students had to choose one option out of the four in each question. For a detailed list of results in this section see Figure 2.

In item #1 who did not know the right answer chose “brend” instead of “brand”, which means that they all had something in their mind about this word but were not sure of the spelling. Question #2, #3 and #5 were not suitable for measuring these students’ knowledge because everyone chose the right answer. Item #4 brought a contradictory result; though 10 knew what to choose, the best two did not know the proper solution. In question #6 only 3 students put the good solution; almost everyone took the word “outview” instead of “lookout”, which is a non-existing word, but worked perfectly as a distracter. Items #7 and #8 were solved by 4 students only, but together with questions #9 and #10, they were known for the best three students.

Figure 2: Number of right and wrong answers in Part 2.

Part 3. The task was gap filling. The test takers had to fill in the missing words or use other words which also fit in the context. There were 30 gaps; for each good answer the students got 0,5 points.

There were only 4 gaps that all testees were able to complete in any way (for details see Figure 3). Another 6 remained unfilled or were completed incorrectly. For questions #13, #21 and #30 only the best three put acceptable answers. In item #10, though 11 out of the 13 students know the solution, the second best did not know any correct answer.

There are four cases in which, though almost everyone knew one of the possible answers, they chose an other word instead of the words appeared in the real text. In gap #1 instead of “whether” all wrote “if”, which is also correct; in gap #6 ”sleeping” instead of “accommodation”; in gap #10 “easily” was “replaced” by well, while in the 23rd gap “continue” by “finish”, which also fits in the text.

Figure 3: Number of right and wrong answers in Part 3.

Evaluation of students

According to the results the 13 students achieved we can divide the group into two main groups. In the first group there were three students who reached 35,5 36 and 37 points, which really stand out from the rest; the next score a student got is only 27 points. These students were sure to know far more. The majority got scores between 22 and 25, which shows us that this test was a little bit difficult for them.

Conclusion

In this research paper I have shown first the preparations for piloting an English language test; how a raw test is prepiloted, revised and administered. In the second part I have given both an overall analysis of the test and the analysis of each and every item from different points of view. I hope this paper helps anyone who reads it get closer to the field of test constructing and analysing; what problems can arise, and what kind of parallels can be drawn between several phenomena.

No comments: