Charles Alderson: Levels of performance and the Hungarian Matura exam in English

In the context of the reform of the Hungarian Matura exam and the Council of Europe Common European Framework of reference, this study discusses the problem of identifying levels of performance in reading in English as a foreign language. A pool of pilot tasks was examined by a board of 9 expert judges whose responses on five procedures were then analysed to establish the difficulty of each task. Empirical methods of scaling task difficulty and candidate ability were also explored, the latter based on Item Response Theory. Anchor items were used to calibrate the various tasks in the pilot booklet. The questions posed included (1) How can difficulty levels best be characterised? (2) How can they be predicted? (3) Is it possible to accommodate the evidently wide range of ability of the pilot population, and the wide range of difficulty of tasks pilotted, within a scheme of two or three levels, as currently required by educational policy? (4) How might such levels correspond to internationally recognised levels? Expert judges were found to be better at predicting empirical difficulties than were item writers. Their judgements appeared to be reliable and consistent across different standard-setting procedures, and the order of difficulty they predicted corresponded broadly to the increase in empirical difficulty. However, there was considerable variation in difficulty across items and test tasks, such that it was difficult, if not impossible, to establish with confidence that a given task was at a given level. It was found that tasks will vary in difficulty, despite the intentions of item writers or the opinions of expert judges. Therefore the true level of an item, a task or the whole test can only be determined post-hoc, by analysing the results.

MAGYAR PEDAGÓGIA 100. Number 4. 423-458. (2000)

