Refining assessment without levels

One of the first things I’ve been involved with in my new post has been the development of assessment without levels. It’s been strange for me to move back to a school still using them! I’m teaching Year 7 English and I’ve had to re-learn (temporarily at least!) the levels system to assess their assignments. What struck me particularly was the way learning gets lost when you hand back assignments with levels on them. I’d been so used to handing work back with formative comments only over the last two years that I was quite unprepared for the buzz of “what did you get?”, fist-pumping triumph when a Level 5.6 was awarded (“I was only 5.3 last time!”) and disappointment on the flip side. I had to work really hard to focus the students on my carefully crafted formative feedback and DIRT tasks – and I know that some of them only paid lip-service only to my requests to engage with the comments in a “please-the-teacher” exercise whilst their minds were still occupied with the level. All I kept thinking about was Dylan Wiliam’s advice about ego-involving and task-involving feedback:

Levels have to go, then – this is not a surprise. It’s also perhaps unsurprising that Churchill have hung on for them, with a new Headteacher incoming (especially one who has blogged extensively about assessment without levels!) My big advantage is in having implemented assessment without levels once, I can refine and develop the approach for my second go. I’m still pretty happy with the growth and thresholds model (originally proposed by Shaun Allison here) which was implemented at my previous school, but there are definitely refinements to make. In particular, a couple of posts have stuck with me in terms of reviewing the way we assess. The first is by the always-thought-provoking Daisy Christodolou, who got my mental cogs whirring in November with Comparative Judgment: 21st Century Assessment. In this post, the notion that you can criteria-reference complex tasks like essays and projects is rightly dismissed:

” [it] ends up stereotyping pupils’ responses to the task. Genuinely brilliant and original responses to the task fail because they don’t meet the rubric, while responses that have been heavily coached achieve top grades because they tick all the boxes…we achieve a higher degree of reliability, but the reliable scores we have do not allow us to make valid inferences about the things we really care about.”

Instead, Daisy argues, comparing assignments, essays and projects to arrive at a rank order allows for accurate and clear marking without referencing reams of criteria. Looking at two essays side-by-side and deciding that this one is better than that one, then doing the same for another pair and so on does seem “a bit like voodoo” and “far too easy”…

“…but it works. Part of the reason why it works is that it offers a way of measuring tacit knowledge. It takes advantage of the fact that amongst most experts in a subject, there is agreement on what quality looks like, even if it is not possible to define such quality in words. It eliminates the rubric and essentially replaces it with an algorithm. The advantage of this is that it also eliminates the problem of teaching to the rubric: to go back to our examples at the start, if a pupil produced a brilliant but completely unexpected response, they wouldn’t be penalised, and if a pupil produced a mediocre essay that ticked all the boxes, they wouldn’t get the top mark. And instead of teaching pupils by sharing the rubric with them, we can teach pupils by sharing other pupils’ essays with them – far more effective, as generally examples define quality more clearly than rubrics.”

The bear-trap of any post-levels system is always to find that you’ve accidentally re-created levels by mistake. Michael Tidd has been particularly astute about this in the primary sector: “Have we simply replaced the self-labelling of I’m a Level 3, with I’m Emerging?” This is why systems like the comparative judgment engine on the No More Marking site are useful. Deciding on a rank order allows you to plot the relative attainment of each piece of work against the cohort; “seeding” pre-standardized assignments into the cohort would then allow you to map the performance of the full range.

At this point, Tom Sherrington’s generously shared work on his assessment system using the bell curve comes to the fore. Tom first blogged about assessment, standards and the bell curve in 2013 and has since gone on to use the model in the KS3 assessment system developed at Highbury Grove. “Don’t do can do statements” he urges – echoing Daisy Christodolou’s call to move away from criteria-referencing – and instead judge progress based on starting points:

 

Tom Sherrington’s illustration of bell curve progress judgments

 

Finally, this all makes sense. This is how GCSE grades are awarded – comparable outcomes models the scores of all the students in the country based on the prior attainment model of that cohort, and shifts grade boundaries to match the bell curve of each cohort. It feels alien and wrong to teachers like me, trained in a system in which absolute criteria-referenced standards corresponded to grades, but it isn’t – it makes sense. Exams are a competition. Not everyone can get the top grades.It also makes sense pedagogically. We are no longer in a situation where students need to know specific amounts of Maths to get a C grade (after which point they can stop learning Maths); instead they need to keep learning Maths until they know as much Maths as they possibly can – at which point they will take their exams. If they know more Maths than x percentage of the rest of the country, they will get x grade. This is fair.

Within the assessment system, getting a clear and fair baseline assessment (we plan to use KS2 assessments, CATs and standardised reading test scores) will establish a starting profile. At each subsequent assessment point, whether it be in Dance, Maths, Science, History or Art, comparative judgment will be used to create a new rank order, standardised and benchmarked (possibly through “seeded” assignments or moderated judgment). Students’ relative positions at these subsequent assessment points will then allow judgments of progress: if you started low but move up, that’s good progress. If you start high but drop down, we need to look at what’s happening. Linking the assignments to a sufficiently challenging curriculum model is essential; then if one assignment is “easier” or “harder” others it won’t matter – the standard is relative.

As with all ventures in this field, it’s a tentative step. What we’ve come up with is in the developmental stage for a September launch. But moving away from criteria-referencing as the arbiter of standards has been the most difficult thing to do, because it’s all many of us have ever known. But that doesn’t make it right.

always done it

The nonsense of the grade descriptors

This week I have finalised our new Assessment, Marking and Feedback policy and submitted the draft to the Governors for review. This policy was a complete rewrite, incorporating and committing to our latest thinking on assessment without levels and closing the gap marking and feedback. I also spent some time preparing for our assessment without levels network meeting by working on the English assessment framework, which we’re basing on the groundwork from Belmont school and David Didau shared by Dan Brinton. One of the tasks I was trying to do was to match the assessment criteria we had created as closely as I could to the grade descriptors for GCSEs graded 1-9 published in November by the DfE. Except there was a problem. The grade descriptors are completely useless.

It starts with this gem in the “Detail” :

We have developed ‘grade descriptors’ for the new GCSEs graded 9 to 1 in English language, English literature and mathematics. They are different from ‘grade descriptions’, which apply to GCSEs graded A* to G.

I already feel like I’m reading a bureaucratic satire; this could be straight from a Yes Minister script. Before you even click on this link for the English Language descriptions descriptors, there’s this sober warning:

These descriptors are not designed to be used for awarding purposes in 2017. Statistical predictions will be used to set grade outcomes at whole subject level.

So, translated, “here is a descriptor for a grade 8, but it won’t be used to award a grade 8 because that will be decided statistically.” Which begs the question…why publish these at all?

Discouraged, but not deterred, I pressed on to the descriptions descriptors themselves. Here’s a comparison between Grades 5 and 8 for reading in English Language:

Comparing Grade 5 with Grade 8 in new GCSE English Language. Spot the difference?

Comparing Grade 5 with Grade 8 in new GCSE English Language. Spot the difference?

At this point I realised I was on a fool’s errand. If I was going to start chasing the shadows of whether kids were “substantiating” or “supporting” their understanding and opinions with references which were “apt” or  “illuminating” I would surely run mad. The anchor point for Grade 8 is supposed to be the current A*, whilst Grade 5 is the top of C / bottom of B. There would be no way of delineating Grades 6 and 7 in between these two, surely?

choc_teapot-groovy

DfE grade descriptors: about this useful (Source)

I sat back, breathed deeply, and remembered this:

I had, for half an hour or so, slipped back into the old “levels” way of thinking. Not being able to tie our English assessment framework to GCSE grades or National Curriculum levels is a blessing. It matters not one jot whether a piece of work is a C, Level 5a, B+, or Grade 6. What matters are the key questions of assessment:

  • What is successful about it?
  • What could be done to improve it?

Identifying the answers to these questions is the key to our assessment policy; communicating those answers the key to the feedback policy. If we get that right, students will get the grades that they are statistically assigned deserve at the end of the course.

Implementing Assessment Without Levels

I have blogged twice before about assessment in the new national curriculum (here and here) and looking back at those two posts now it seems high time for an update. We’ve moved on quite some way and we are now implementing our assessment without levels system (or AWOL, as our Head of Science seems intent on calling it) across the school.

Context

We haven’t been using National Curriculum levels for a year or so now, but instead we have been using a system of “Chew Valley Levels” linked to GCSE grades as follows:

levelsgradesnewgcseThis was only ever going to be a stop-gap measure, providing some continuity for students and their families whilst we explored the alternatives. In reality, in the world of comparable cohort outcomes we are not able to say with any certainty what a “C” grade at GCSE is, only what it was last year, and thus tying our levels to this moveable feast rendered them no more reliable than the preceding National Curriculum levels. We have even less idea what students will have to know or be able to do to be awarded the new GCSE 1-9 grades, though Ofqual have published this:

ofqual gradesWhat we know, then, is that roughly the same proportion of students as currently achieve grades A*-C in existing GCSEs will achieve grades 9-4 in the new GCSEs, but that the threshold standard is being raised to grade 5 against international benchmarks. But we still don’t know what students will have to know or be able to do for that grade 5 in any given subject, nor are we likely to as the boundaries will shift year on year, especially in the infancy of the qualifications I suspect.

All of this means two things:

  1. We need to aim higher if we are to get as many students as possible to grade 5 and above – it will be tougher than C and above.
  2. Linking our assessment system to GCSE grades (as was our original plan) is not going to work.

The Threshold Model

Both Shaun Allison and Dan Brinton have been instrumental in clarifying my thinking about the threshold model of assessment. I highly recommend you read Shaun’s Assessment without levels – an opportunity for growth and Dan’s Designing a new post-levels curriculum and assessment model from scratch as they are both superb and you will see that a lot of the ideas in this post are not, in fact, mine, but theirs!

Essentially, in a threshold model, you set up your curriculum with an expectation in terms of content and skills within each unit, year, or key stage – the “threshold”. At each point, you assess students to see the extent to which they have met, exceeded or fallen short of this threshold. The model has the advantage of letting teachers decide the expectations (thus allowing challenge to be built-in) and providing ready-made opportunities for formative assessment and feedback in relation to the threshold expectations. As implied by the title of Shaun’s blogpost, it’s also a system that is compatible with our growth mindset ethos – more of which later.

What to call the threshold? 

This was a tricky point. Initially we considered the Durrington School model using four noun descriptors:

Thresholds from Durrington High School (from http://classteaching.wordpress.com/)

We were then quite taken with the Belmont School model using verbs to describe the thresholds:

Example assessment scheme from Belmont School Science (from http://belmontteach.wordpress.com/)

However, we couldn’t quite agree on language that would fit different subjects appropriately. We considered with the idea of using new GCSE grades 1-9, but they sounded too like the old levels and, in any case, we would have to guess at the standard they represent which is not a sound basis for a new assessment system.

In the end, we decided to use existing letter grades A*-G on the basis that students and their families understand them, and they have a well-understood threshold built in to them at the C grade boundary. Thus students meeting the threshold expectations of our curriculum at each point will be assessed at grade C. Those who exceed it can be graded A*, A or B; those who fall short can be graded D, E, F or G. Each year’s threshold directly correlates to the next year as illustrated in the following table:

AWOL Mapping

In other words, if a student can demonstrate they have met the demands of the Year 8 curriculum by the end of Year 7, they would be graded as B in Year 7. An A* student in Year 7 would be demonstrating the knowledge, skills and processes required of a Key Stage 4 student who had met the C grade threshold.

Mapping the threshold standard onto the curriculum

In our system, for students to have achieved the threshold on entry to the school, they will have to demonstrate that they have met the requirements of the Key Stage 2 National Curriculum in each subject. In time, we will receive this information (at least for English and Maths) from the new Key Stage 2 assessments, but this will not happen until 2016. Therefore we will baseline all students (as we do currently) to assess the extent to which they have met those requirements at the start of Year 7. Any gaps or shortfall will need to be addressed early.

The threshold standard in each year will be decided by the teaching teams within school. It will be informed – but not limited – by the relevant national curriculum requirements, of course, but the guiding principle is that if we are going to value what we assess then we must assess what we value. Therefore the new curriculum that is being designed is based on the key ideas, concepts, knowledge and skills within each subject, informed by the national curriculum, but decided by teachers. At Key Stage 4 the curriculum will, of course, incorporate the examined elements of the KS4 programme of study or examination specification but will not be limited by that. Our best students go beyond and around the specifications anyway – and so should we. If we are going to prepare our students to do well at GCSE we should be teaching them beyond GCSE in Year 11, so the terminal exams they sit seem like a walk in the park in comparison to what they have been doing in the classroom.

Tracking progress in the new assessment system

We already use a flight paths model to track progress in all subjects. Within the new system, tracking progress is even easier.

AWOL Progress

In the example above, a student is assessed at C on baseline and maintains that performance through Year 7 and 8 (the green highlighting). We should expect all students to maintain their performance through the curriculum, but challenge them to improve it. In Year 9, the table above illustrates what happens if a students improves their performance (the orange highlighting), leading to better than expected GCSE outcomes, and what happens if they do not make progress through the year (the red highlighting), leading to an under-performance at GCSE.

At this stage it’s important to be clear about the expectations of this system. No matter where students are on their baseline assessment, it is the job of the curriculum to ensure that as many of them as possible end up above the threshold by the end of each unit, year, or key stage. If a student is a D or an E at the baseline, that is not an excuse to stay on a D or E for the whole of their school career. Rather, it is a challenge to the teacher and the curriculum to fill in the gaps from the previous year as well as teaching the content, skills and processes of the current year. This is the gap that must be closed.

Terminal Assessment

One important element of this system is the introduction of a terminal assessment in each year of the curriculum. This assessment will be in the same style as the GCSE assessment in the relevant subject, and should assess the knowledge, skills and processes of the entire year’s curriculum. Students will have to revise for it, thus preparing them earlier for the demands of the two-year linear assessments in Year 11. We also intend that this approach will improve retention and recall, as curriculum design will be interleaved to incorporate regular revisiting of the key knowledge, skills and processes.

The end of targets? 

Finally, this model moves away from giving each student a target to aim for based on FFT D, CATs predictors, teacher assessment plus some magic dust and a following wind (as described in this early post on my blog – see, David Didau doesn’t have the monopoly on changing his mind!). When I launched our growth mindset ethos, one of the first responses I had was our Head of Geography asking if this was the end of Challenge Grades (our term for student targets). If potential is unknowable, why are we selecting an arbitrary grade and pretending to know it? She was (and is) right, of course. So we will now be judging progress based on where students start and how far they’ve come from that known point, rather than how far they’ve got to go to a point which we cannot and should not pretend to know.

So what now? 

  1. Faculties are meeting on Monday to begin the process of deciding the threshold standard for each year within the curriculum. Some are already there, having piloted systems through last academic year. Others will need to move this year.
  2. We need to finalise what reports to parents will look like in the new system – we have a draft, but it needs thinking through.
  3. We need to troubleshoot the progress measures – if a student moves up a grade in Year 8, does the higher grade become their new baseline or do we continue to measure progress from the start of Year 7 baseline point? What if they drop a grade?
  4. We need to decide when we move over from the legacy system to the new. We only get one chance to get this right – so we need to be sure we have it sorted.

These are the practicalities – but the principles I am certain are right and the system I am sure is workable. I’ll keep you posted!

UPDATE: we are holding an open meeting in January 2015 to share our approach with colleagues. Details here: https://echewcation.wordpress.com/assessment-without-levels/

Assessment in the new national curriculum – next steps

My original post “Assessment in the new national curriculum – what we’re doing” remains one of the most popular on this blog. Here I will outline how we have refined the model proposed in that post and integrated it with progress tracking, as well as our latest thoughts on assessment without levels and growth mindset.

How will we assess in the new national curriculum? 

I was delighted to hear that Durrington High School had been awarded an assessment innovation fund grant by the DfE. I was even more delighted when Durrington DHT Shaun Allison published his thoughts so far in an excellent blogpost! As a school also actively pursuing a growth mindset, the approach to assessment outlined by Shaun struck a chord and seemed closely aligned to what we are trying to achieve at Chew Valley. I presented the key points of the Durrington approach to middle leaders yesterday and we have adopted the principle of the Growth and Thresholds assessment system, explained as follows (paraphrased from Class Teaching):

Teachers identify the key knowledge and skills students need in order to be successful in KS4 and work backwards to decide what this would look like, if students have mastered it in KS3 – the excellence standard. Teachers then produce a curriculum and assessment framework that allows teachers and students to know what they’ve got to do to achieve excellence.  

In the Chew Valley version, we will continue to use GCSE grades as the basis for our assessment model. It makes sense, longer term, to use the new 1-9 GCSE grade scale as a whole-school assessment framework, with rough equivalents as follows:

levelsgradesnewgcse

In other words, students entering in Year 7 would be assessed with grades usually between 1 and 4, and move up a consistent assessment scale throughout their time in secondary school.

We remain wedded to the notion of criteria referenced assessment, although I enjoyed having my thinking pushed on this by Daisy Christodoulou’s provocative defence of norm-referencing. The problem comes with the assumption that there will be clear criteria attached to the new GCSE grades 1-9; my understanding is that there will be criteria attached to the levels and marks within the new GCSE specifications but that they will not be clearly linked to specific GCSE grades. This will allow Ofqual to apply comparable outcomes and shift the boundaries year on year. Thus we will need to assign criteria to the new GCSE grades on a “best fit” basis, leading to some insecurity and uncertainty within the assessment framework, especially in the early stages.

We have not yet decided when we will shift over to 1-9 grades. The existing system will hold up until 2016 at least, and then there will be an incremental shift as first English and Maths, then Science, History, Geography and Languages, then arts subjects move over to the new grades. We also haven’t decided if we’re going to sub-grade them – grade 2c, 2b, 2a anyone? It was a bastardisation of the national curriculum levels; should we be wary of falling into the same trap again? We’re taking a watching brief on both these issues!

Tracking progress in the new assessment framework

With the advent of Progress 8 (blogged about here) we have been running an experiment with progress tracking using flight paths (blogged about here). As indicated in that second blog, in the initial experiment we tracked progress in English and Maths from their respective KS2 baselines, and all other subjects from the average points score of English and Maths at KS2. This worked fine for English and Maths, but it didn’t work for other subjects. I know it seems obvious that tracking progress in Drama from a baseline of the average of tests in English and Maths won’t work, but that is the methodology being applied in the Progress 8 measure so I thought we’d better use it. What I’d got wrong, of course (it’s so easy to do!) was that I’d let the accountability framework dictate my practice rather than common sense and what was right for the learners. So, we’ve made a change.

From September, we will continue to use the KS2 baselines for English and Maths – this is a tried and tested approach and it is giving us clear and helpful data both for individual students and for self-evaluation and external accountability purposes. In all other subjects, we will conduct a baseline assessment in the first term of Year 7 to establish a clear, subject-specific starting point for each student. We will then use that baseline assessment to track progress in each subject across KS3. We will treat the baseline assessment as the “baseline” in the same way as KS2 English and Maths data, even though they will be four or five months apart in time, and apply the flight paths model to each subject in exactly the same way:

Progress flight paths tabulated

Progress flight paths tabulated

We still have the existing template to track progress against an English and Maths KS2 average points score, so I will be able to keep an eye on the Progress 8 headlines, but this refined model will provide the ability to track progress in, for example, Art from their starting point in Art. Which seems obvious, doesn’t it?

In time we will convert the “levels” in those flight paths to the “grades” via the equivalences listed in the table above. It may be that in, for example, languages, the baseline will be very low (where students have not studied that particular language in primary) and this may require the model to be refined – watch this space!

Targets and a growth mindset

When I launched the idea of becoming a growth mindset school back in March, several staff discussed the idea of targets (we call them challenge grades or levels) and whether they were compatible with a growth mindset. Potential, according to Dweck, is limitless – it’s not about aiming for a destination but about constantly continuing to improve. As John Tomsett said in a conversation on twitter recently:

I overheard a conversation between two girls revising for a languages exam this week. They were working on tenses. One said to the other: “I don’t need to know that; that’s what you need to do to get a B. I only need a C.” Her companion was aiming for a B, so continued to revise it. This is why Michael Gove was so against early entry – the wasteful settling for a lower level of achievement. This is the danger of target grades – if students work hard and get there, they stop. And, unless that target grade is an A* (and even then), that is a waste.

This is a substantial shift in my thinking (see one of the earliest posts on this blog, Targets, for my starting point!), but actually the flight paths approach provides us with a different way to frame the conversation about progress. In the old model I would use formulae and statistical cohort analysis tools like CATs, FFT and the like to predict likely outcomes and “add a bit on for challenge”, then track and discuss progress towards that made up number. It makes more sense to me now to assess where students are starting from and then feed back whether their progress is below, expected, better than expected, outstanding or world class from that starting point (using the flight paths model). Thus reports to parents might say “Matilda is currently working at a Grade 3 in Science, and this represents better than expected progress from her starting point in this subject”. At the moment this is a tentative, half formed policy shift which will need to be put through the crucible of SLT and Governors – what better way to try it out than to put it to the test on twitter first?

In summary

The abolition of national curriculum levels remains an opportunity to do something different and better with curriculum and assessment across the whole of a student’s school experience. The fact that each individual school is having to come up with its own system remains a fatal flaw in terms of capacity. The new assessment innovation packages may go some way to preventing this – especially if they are of the quality of the work coming out of Durrington. Whilst there is still a lot of work to do, and a lot of uncertainty, it is still my aim that assessment and curriculum in my school will be the better for the reforms.