Comparative Judgement: A workshop exploring its use in MFL
7 December 2018
Author: Dr Caroline Creaby - Research School Director
On 26th November 2018 we hosted a fantastic event about comparative judgement, an alternative approach to traditional teacher marking of summative assessments. We were joined by Daisy Christodoulou, former head of assessment at Ark and now Director of Education at No More Marking. We were also pleased to hear from David Williams and Eileen Kincaid from the MFL faculty at Sandringham School who have been using comparative judgement to assess written assessments.
The session began with a really helpful distinction between absolute and comparative judgements. For example, if a person walked into the room you’re sitting in right now, you might struggle to make a judgement about their exact height. However, if two people walked in, it would be a more straightforward task for you to judge which person was taller i.e.making a comparative judgement, rather than an absolute one, is more straightforward for us all. This principle was powerfully illustrated with the ‘colours game’ available on the No More Marking website which I would encourage you to try!
This was then related to how we approach marking. When marking a student’s work against set of criteria or a mark scheme, it can be challenging to decide on a specific mark. And even if we do decide, our opinion may change as we mark more work, or if we mark the same piece of work on a different day or at a different time. Further, expecting all your department colleagues to agree on your mark is more challenging still. However, should you be presented with two pieces of work and asked to judge simply which is better, this would be a considerably easier task. Furthermore, you might be more likely to be in agreement with your colleagues. Taking the principle that we find absolute judgements challenging and that these judgements may not necessarily be accurate or reliable, using comparative judgement can be a more effective and reliable approach to marking.
Taking a comparative judgement approach to marking works by a teacher making a decision between two pieces of work and asking which is better. In the case of No More Marking, students’ work is presented side by side on screen (for example, in the image below). Having made a judgement about two pieces of work, a teacher would then go on to make a series of judgements for a number of students’ work which automatically appear on the screen. On average, this takes about 20 seconds per judgement. To increase the reliability of judgements, a range of colleagues make judgements about the same set of work and an algorithm then brings together all the judgements to provide a rank of all the students’ work. A scaled score and marks can subsequently be applied.
To illustrate this approach to marking, delegates were given the opportunity to have a go at marking a set of answers from a primary writing task using No More Marking for themselves. Getting to grips with the technology, and experiencing the relative ease and speed of this approach was helpful for delegates in getting a sense of how it works. Further, it helped answer delegates’ questions of whether we can trust the judgements being made.
On this issue of reliability, No More Marking as a reliability metric. If all teachers were in agreement about every judgement they made with no variation, it would generate a score of 1. If there was no agreement between teachers, the score would be 0. We would always expect high levels of reliability in closed question assessments but we would expect less reliability in an open task.
Before the delegates were given their score for the reliability of the judgements they made for the primary task, we discussed what tolerance or range of marks would be acceptable to them. We learned from Daisy that recent Ofqual data revealed that on a typical 40 mark question in GSCE and A level exams, a tolerance of +/- 5 marks was considered acceptable i.e. a total error of 10 marks. This would translate to a score of 0.6 on No More Marking’s reliability metric. Coming back to the reliability of the delegate’s judgements for the primary writing task, the outcome was an impressive 0.91. If this was a 40 mark exam questions, this metric would translate to a tolerance of +/- 2 marks.
Having understood the principles of comparative judgement, how No More Marking worked and the reliability that this approach offered, we then heard from David Williams, Director of Learning for MFL at Sandringham School. David offered a really helpful account of how he introduced No More Marking at Sandringham to assess summative writing tasks in MFL.
One of David’s reflections was about how to provide feedback to students once they had carried out a summative assessment. In an early trial, he found himself using No More Marking to make comparative judgements about a class set of assessments but then went onto mark them by hand so that students would receive written comments on their work. This obviously added to his workload and decreased the perceived value of the approach. This was an important process as it helped him to crystalise the purpose of using a comparative judgement tool. Ultimately, this is a more accurate and reliable approach to summative assessment. It isn’t designed to provide students with formative feedback. Once this became clear and David could see that this was a summative tool and was comfortable that students didn’t necessarily need personalised feedback on their own work, he felt more empowered to use the software for summative, end of year assessments.
David then worked with his team to use this approach to assess written assessments in end of year exams. He made clear to colleagues that they weren’t expected to provide students with individual feedback; they were already providing students with this through other formative tasks. However ,having judged a set of assessments, teachers inevitably picked up misconceptions, common errors and good practice which they were able to feedback at a whole-class level.
David shared his increased confidence in the reliability of assessment (his reliability metric didn’t fall below 0.89 for example). David reflected that within an MFL faculty there were a number of teachers that could make judgements and hence this ensures a good standard of reliability; colleagues in smaller faculties may struggle in this respect. David also commented that he was able to make decisions about students moving sets with more confidence now that he was more certain that assessments were robust and reliable.
Eileen Kincaid, Assistant Director of Learning for MFL, shared her reflections of using comparative judgement. She noted the significant amount of time that had been saved by using this approach for summative assessments. She also enjoyed the process as it offered her an insight to other’s colleagues’practice where a whole class’s performance emerged as stronger for example. This led her to engage in conversations with colleagues about their teaching and how they approached prepared their students for exams.
David and Eileen’s reflections were helpful in clarifying how teachers and schools can utilise comparative judgement and the practical steps to consider along the way. In the session, Daisy signposted us to a range of other examples of how comparative judgement has been used in differnt subject and schools.
Greenshaw School in Sutton: https://blog.nomoremarking.com/using-comparative-judgement-in-different-subjects-at-ks3-4415195f8947
Trinity School in Lewisham: https://blog.nomoremarking.com/case-study-using-comparative-judgement-to-assess-secondary-english-e0d9f4afbb857 December 2018
Posted in: Blog