One of the first things I’ve been involved with in my new post has been the development of assessment without levels. It’s been strange for me to move back to a school still using them! I’m teaching Year 7 English and I’ve had to re-learn (temporarily at least!) the levels system to assess their assignments. What struck me particularly was the way learning gets lost when you hand back assignments with levels on them. I’d been so used to handing work back with formative comments only over the last two years that I was quite unprepared for the buzz of “what did you get?”, fist-pumping triumph when a Level 5.6 was awarded (“I was only 5.3 last time!”) and disappointment on the flip side. I had to work really hard to focus the students on my carefully crafted formative feedback and DIRT tasks – and I know that some of them only paid lip-service only to my requests to engage with the comments in a “please-the-teacher” exercise whilst their minds were still occupied with the level. All I kept thinking about was Dylan Wiliam’s advice about ego-involving and task-involving feedback:
Levels have to go, then – this is not a surprise. It’s also perhaps unsurprising that Churchill have hung on for them, with a new Headteacher incoming (especially one who has blogged extensively about assessment without levels!) My big advantage is in having implemented assessment without levels once, I can refine and develop the approach for my second go. I’m still pretty happy with the growth and thresholds model (originally proposed by Shaun Allison here) which was implemented at my previous school, but there are definitely refinements to make. In particular, a couple of posts have stuck with me in terms of reviewing the way we assess. The first is by the always-thought-provoking Daisy Christodolou, who got my mental cogs whirring in November with Comparative Judgment: 21st Century Assessment. In this post, the notion that you can criteria-reference complex tasks like essays and projects is rightly dismissed:
” [it] ends up stereotyping pupils’ responses to the task. Genuinely brilliant and original responses to the task fail because they don’t meet the rubric, while responses that have been heavily coached achieve top grades because they tick all the boxes…we achieve a higher degree of reliability, but the reliable scores we have do not allow us to make valid inferences about the things we really care about.”
Instead, Daisy argues, comparing assignments, essays and projects to arrive at a rank order allows for accurate and clear marking without referencing reams of criteria. Looking at two essays side-by-side and deciding that this one is better than that one, then doing the same for another pair and so on does seem “a bit like voodoo” and “far too easy”…
“…but it works. Part of the reason why it works is that it offers a way of measuring tacit knowledge. It takes advantage of the fact that amongst most experts in a subject, there is agreement on what quality looks like, even if it is not possible to define such quality in words. It eliminates the rubric and essentially replaces it with an algorithm. The advantage of this is that it also eliminates the problem of teaching to the rubric: to go back to our examples at the start, if a pupil produced a brilliant but completely unexpected response, they wouldn’t be penalised, and if a pupil produced a mediocre essay that ticked all the boxes, they wouldn’t get the top mark. And instead of teaching pupils by sharing the rubric with them, we can teach pupils by sharing other pupils’ essays with them – far more effective, as generally examples define quality more clearly than rubrics.”
The bear-trap of any post-levels system is always to find that you’ve accidentally re-created levels by mistake. Michael Tidd has been particularly astute about this in the primary sector: “Have we simply replaced the self-labelling of I’m a Level 3, with I’m Emerging?” This is why systems like the comparative judgment engine on the No More Marking site are useful. Deciding on a rank order allows you to plot the relative attainment of each piece of work against the cohort; “seeding” pre-standardized assignments into the cohort would then allow you to map the performance of the full range.
At this point, Tom Sherrington’s generously shared work on his assessment system using the bell curve comes to the fore. Tom first blogged about assessment, standards and the bell curve in 2013 and has since gone on to use the model in the KS3 assessment system developed at Highbury Grove. “Don’t do can do statements” he urges – echoing Daisy Christodolou’s call to move away from criteria-referencing – and instead judge progress based on starting points:
Finally, this all makes sense. This is how GCSE grades are awarded – comparable outcomes models the scores of all the students in the country based on the prior attainment model of that cohort, and shifts grade boundaries to match the bell curve of each cohort. It feels alien and wrong to teachers like me, trained in a system in which absolute criteria-referenced standards corresponded to grades, but it isn’t – it makes sense. Exams are a competition. Not everyone can get the top grades.It also makes sense pedagogically. We are no longer in a situation where students need to know specific amounts of Maths to get a C grade (after which point they can stop learning Maths); instead they need to keep learning Maths until they know as much Maths as they possibly can – at which point they will take their exams. If they know more Maths than x percentage of the rest of the country, they will get x grade. This is fair.
Within the assessment system, getting a clear and fair baseline assessment (we plan to use KS2 assessments, CATs and standardised reading test scores) will establish a starting profile. At each subsequent assessment point, whether it be in Dance, Maths, Science, History or Art, comparative judgment will be used to create a new rank order, standardised and benchmarked (possibly through “seeded” assignments or moderated judgment). Students’ relative positions at these subsequent assessment points will then allow judgments of progress: if you started low but move up, that’s good progress. If you start high but drop down, we need to look at what’s happening. Linking the assignments to a sufficiently challenging curriculum model is essential; then if one assignment is “easier” or “harder” others it won’t matter – the standard is relative.
As with all ventures in this field, it’s a tentative step. What we’ve come up with is in the developmental stage for a September launch. But moving away from criteria-referencing as the arbiter of standards has been the most difficult thing to do, because it’s all many of us have ever known. But that doesn’t make it right.