Refining assessment without levels

One of the first things I’ve been involved with in my new post has been the development of assessment without levels. It’s been strange for me to move back to a school still using them! I’m teaching Year 7 English and I’ve had to re-learn (temporarily at least!) the levels system to assess their assignments. What struck me particularly was the way learning gets lost when you hand back assignments with levels on them. I’d been so used to handing work back with formative comments only over the last two years that I was quite unprepared for the buzz of “what did you get?”, fist-pumping triumph when a Level 5.6 was awarded (“I was only 5.3 last time!”) and disappointment on the flip side. I had to work really hard to focus the students on my carefully crafted formative feedback and DIRT tasks – and I know that some of them only paid lip-service only to my requests to engage with the comments in a “please-the-teacher” exercise whilst their minds were still occupied with the level. All I kept thinking about was Dylan Wiliam’s advice about ego-involving and task-involving feedback:

Levels have to go, then – this is not a surprise. It’s also perhaps unsurprising that Churchill have hung on for them, with a new Headteacher incoming (especially one who has blogged extensively about assessment without levels!) My big advantage is in having implemented assessment without levels once, I can refine and develop the approach for my second go. I’m still pretty happy with the growth and thresholds model (originally proposed by Shaun Allison here) which was implemented at my previous school, but there are definitely refinements to make. In particular, a couple of posts have stuck with me in terms of reviewing the way we assess. The first is by the always-thought-provoking Daisy Christodolou, who got my mental cogs whirring in November with Comparative Judgment: 21st Century Assessment. In this post, the notion that you can criteria-reference complex tasks like essays and projects is rightly dismissed:

” [it] ends up stereotyping pupils’ responses to the task. Genuinely brilliant and original responses to the task fail because they don’t meet the rubric, while responses that have been heavily coached achieve top grades because they tick all the boxes…we achieve a higher degree of reliability, but the reliable scores we have do not allow us to make valid inferences about the things we really care about.”

Instead, Daisy argues, comparing assignments, essays and projects to arrive at a rank order allows for accurate and clear marking without referencing reams of criteria. Looking at two essays side-by-side and deciding that this one is better than that one, then doing the same for another pair and so on does seem “a bit like voodoo” and “far too easy”…

“…but it works. Part of the reason why it works is that it offers a way of measuring tacit knowledge. It takes advantage of the fact that amongst most experts in a subject, there is agreement on what quality looks like, even if it is not possible to define such quality in words. It eliminates the rubric and essentially replaces it with an algorithm. The advantage of this is that it also eliminates the problem of teaching to the rubric: to go back to our examples at the start, if a pupil produced a brilliant but completely unexpected response, they wouldn’t be penalised, and if a pupil produced a mediocre essay that ticked all the boxes, they wouldn’t get the top mark. And instead of teaching pupils by sharing the rubric with them, we can teach pupils by sharing other pupils’ essays with them – far more effective, as generally examples define quality more clearly than rubrics.”

The bear-trap of any post-levels system is always to find that you’ve accidentally re-created levels by mistake. Michael Tidd has been particularly astute about this in the primary sector: “Have we simply replaced the self-labelling of I’m a Level 3, with I’m Emerging?” This is why systems like the comparative judgment engine on the No More Marking site are useful. Deciding on a rank order allows you to plot the relative attainment of each piece of work against the cohort; “seeding” pre-standardized assignments into the cohort would then allow you to map the performance of the full range.

At this point, Tom Sherrington’s generously shared work on his assessment system using the bell curve comes to the fore. Tom first blogged about assessment, standards and the bell curve in 2013 and has since gone on to use the model in the KS3 assessment system developed at Highbury Grove. “Don’t do can do statements” he urges – echoing Daisy Christodolou’s call to move away from criteria-referencing – and instead judge progress based on starting points:

 

Tom Sherrington’s illustration of bell curve progress judgments

 

Finally, this all makes sense. This is how GCSE grades are awarded – comparable outcomes models the scores of all the students in the country based on the prior attainment model of that cohort, and shifts grade boundaries to match the bell curve of each cohort. It feels alien and wrong to teachers like me, trained in a system in which absolute criteria-referenced standards corresponded to grades, but it isn’t – it makes sense. Exams are a competition. Not everyone can get the top grades.It also makes sense pedagogically. We are no longer in a situation where students need to know specific amounts of Maths to get a C grade (after which point they can stop learning Maths); instead they need to keep learning Maths until they know as much Maths as they possibly can – at which point they will take their exams. If they know more Maths than x percentage of the rest of the country, they will get x grade. This is fair.

Within the assessment system, getting a clear and fair baseline assessment (we plan to use KS2 assessments, CATs and standardised reading test scores) will establish a starting profile. At each subsequent assessment point, whether it be in Dance, Maths, Science, History or Art, comparative judgment will be used to create a new rank order, standardised and benchmarked (possibly through “seeded” assignments or moderated judgment). Students’ relative positions at these subsequent assessment points will then allow judgments of progress: if you started low but move up, that’s good progress. If you start high but drop down, we need to look at what’s happening. Linking the assignments to a sufficiently challenging curriculum model is essential; then if one assignment is “easier” or “harder” others it won’t matter – the standard is relative.

As with all ventures in this field, it’s a tentative step. What we’ve come up with is in the developmental stage for a September launch. But moving away from criteria-referencing as the arbiter of standards has been the most difficult thing to do, because it’s all many of us have ever known. But that doesn’t make it right.

always done it

Colour Coded Wet Rats – improving analytical writing in English

Back in December I blogged about my use of colour coded self-assessment with my GCSE Media Studies class, and I promised a follow-up as I applied the model to English. Here is the result!

Colour coded self-assessment is a technique I stole from Louise Pope (@philosophypope), our incredible Head of PSHRE and member of my teaching and learning team. The aim is to get students to identify where they have met the success criteria for a piece of work using coloured highlighting or underlining. Making it visual in this way enables them to spot patterns. For example, they might be hitting one aspect all the time, but another only sporadically or not at all. Having highlighted their first draft, students can then make improvements in their redrafts focused on expanding on the areas they didn’t hit so often the first time.

This year I have a wonderful Year 10 GCSE English Language and Literature group, and we have been working on Romeo and Juliet for the past term. Their understanding of the play was strong, they were engaged and focused. When writing about the play, they knew all about PEE paragraphs but their explanations just weren’t full and detailed enough. Luckily, we have appointed a fantastic new second in English this year, who has revolutionised our teaching of analytical writing with WET RATS.

I worry that I’m late to the party here, and that English teachers up and down the country have been using this technique for years and I’ve somehow been missing out. But WET RATS was new to me, and it has transformed the way my students write about literature. Here is what WET RATS are:

Romeo and Juliet

The mnemonic is used purely for the explanation part of a PEE paragraph. Students don’t need to to use all of the WET RATS in every paragraph, but it gives them options for things to write about. I taught it by modelling how a paragraph might expand from a single quotation in Romeo and Juliet: 

My paragraph was constructed with the students – it’s not intended as an examplar! Also, it’s important that not all of the WETRATS need to be included in a paragraph. I only did that here in order to demonstrate them, and I’m very conscious that my point about “structure” is weak!

Following on from this we have used WETRATS several times to increase familiarity with the mnemonic and the technique itself. This culminated in a full essay on how Shakespeare creates sympathy for Juliet in Act 3 Scene 5 of the play. I’ve used this essay question many times in teaching the play, but the quality of the analysis my students produced was a real step up from their earlier work. We were on our way!

Of course, as a strong proponent of Ron Berger’s Ethic of Excellence approach, the first draft is only the beginning (I’ve seen Austin’s Butterfly!) So after the students had completed their drafts, I got them to colour code each element of the WETRATS across their essays. Here is a gallery of some of their work:

The process of colour coding was invaluable. Firstly, it gave them a specific purpose and focus for critically re-reading their own work – a world away from “check your spellings”! Secondly, it caused them to highlight (literally!) which aspects of the success criteria they were hitting more or less often, identifying clear areas for development and well as strengths. And thirdly, when I came to mark their work I already had a scaffold around which to build my feedback. Interestingly, some of the feedback was along the lines of “you’ve clearly written about structure here, but you haven’t highlighted this section.” This may identify a misconception about what “structure” means as a concept in literature (possibly due to my poor modelling of it in the demonstration), or possibly lazy self-assessment. In either case, something to address!

My second experiment with colour-coded self-assessment has been even more successful than the first, as well as helping the students to engage fully with the WETRATS technique. As with any scaffold, the key will be to take it away gradually so the students can write this well independently – I’m with Tom Sherrington on this one! But at this early stage, performance and the students’ awareness of their own learning and progress is markedly better. And more colourful!

The nonsense of the grade descriptors

This week I have finalised our new Assessment, Marking and Feedback policy and submitted the draft to the Governors for review. This policy was a complete rewrite, incorporating and committing to our latest thinking on assessment without levels and closing the gap marking and feedback. I also spent some time preparing for our assessment without levels network meeting by working on the English assessment framework, which we’re basing on the groundwork from Belmont school and David Didau shared by Dan Brinton. One of the tasks I was trying to do was to match the assessment criteria we had created as closely as I could to the grade descriptors for GCSEs graded 1-9 published in November by the DfE. Except there was a problem. The grade descriptors are completely useless.

It starts with this gem in the “Detail” :

We have developed ‘grade descriptors’ for the new GCSEs graded 9 to 1 in English language, English literature and mathematics. They are different from ‘grade descriptions’, which apply to GCSEs graded A* to G.

I already feel like I’m reading a bureaucratic satire; this could be straight from a Yes Minister script. Before you even click on this link for the English Language descriptions descriptors, there’s this sober warning:

These descriptors are not designed to be used for awarding purposes in 2017. Statistical predictions will be used to set grade outcomes at whole subject level.

So, translated, “here is a descriptor for a grade 8, but it won’t be used to award a grade 8 because that will be decided statistically.” Which begs the question…why publish these at all?

Discouraged, but not deterred, I pressed on to the descriptions descriptors themselves. Here’s a comparison between Grades 5 and 8 for reading in English Language:

Comparing Grade 5 with Grade 8 in new GCSE English Language. Spot the difference?

Comparing Grade 5 with Grade 8 in new GCSE English Language. Spot the difference?

At this point I realised I was on a fool’s errand. If I was going to start chasing the shadows of whether kids were “substantiating” or “supporting” their understanding and opinions with references which were “apt” or  “illuminating” I would surely run mad. The anchor point for Grade 8 is supposed to be the current A*, whilst Grade 5 is the top of C / bottom of B. There would be no way of delineating Grades 6 and 7 in between these two, surely?

choc_teapot-groovy

DfE grade descriptors: about this useful (Source)

I sat back, breathed deeply, and remembered this:

I had, for half an hour or so, slipped back into the old “levels” way of thinking. Not being able to tie our English assessment framework to GCSE grades or National Curriculum levels is a blessing. It matters not one jot whether a piece of work is a C, Level 5a, B+, or Grade 6. What matters are the key questions of assessment:

  • What is successful about it?
  • What could be done to improve it?

Identifying the answers to these questions is the key to our assessment policy; communicating those answers the key to the feedback policy. If we get that right, students will get the grades that they are statistically assigned deserve at the end of the course.

Colour coded self-assessment

This year every member of our teaching staff belongs to a Teaching and Learning team. These cross curricular groups are working together to improve pedagogy as described in my post Teaching and Learning Leaders. There are six teams: Research, Feedback, Independence, Engagement, Differentiation and Mindsets, and the work of each team is posted on our Echewcation teaching and learning blog.

I belong to the mindset team, and this term I have been working with colleagues from Maths and Languages on using self-assessment to improve redrafting. The concept is based on Ron Berger’s book An Ethic of Excellence, and the principles of improving work over time through specific feedback. This is best encapsulated by his famous “Austin’s Butterfly” example – mandatory viewing for all teachers! Just in case you haven’t seen it:

In Berger’s example, the work is improved through kind, specific and helpful peer feedback. I worked on this principle last year (see my post on Closing the Gap Marking and Feedback), and this year I have been looking for ways to encourage students to be more independently reflective on the quality of their initial drafts so that they can see how to improve. The principle we have been exploring in our teaching and learning triad uses colour codes for students to self assess their drafts.

Students use colours to identify successes

Students use colours to identify successes and drive progress

The idea came from our Head of RE and PSHE, Lou Pope (@philosophypope on Twitter), who had used the technique with her groups. When she explained it to the Teaching and Learning Team, I knew I had to give it a go! Here’s how it works:

  • Students complete a first draft of a task, with clear success criteria established
  • They go through their drafts, highlighting where they have met each criterion in a different colour
  • They then reflect on the pattern of colours – which criteria have they consistently met? Which have they met the least? Whereabouts in the work have they achieved the most success? And the least?
  • Redraft…and repeat until excellent.

Photo 11-06-2014 18 11 17

I liked this approach on several levels. Firstly, the act of colour coding the draft forces the student to evaluate every aspect. If they’re not highlighting part of their work, what is it doing there? How is it contributing to the success of the piece overall? Secondly, the visual nature of the finished product was very appealing. It would be easy to see the balance within students’ work of one element over another, and for students themselves to recognise what they needed to do more (or less) of.

I decided to run a trial with my Year 10 GCSE Media Studies group, who were working towards a controlled assessment in Advertising and Marketing based on perfume adverts. The students have never studied Media formally before, so they are still getting to grips with the conventions and demands of the subject, but they are making superb progress. As part of the assignment they need to analyse two existing adverts. I got them to complete this through marginal annotation, then unleashed the coloured pencils! Students had to choose four colours and highlight where they had:

  • used media terminology to identify technical features
  • explored the connotations of the technical features
  • commented on representation
  • commented on the impact of the advert on a specific audience

The gallery below shows a selection of the students’ drafts with their highlighting:

This slideshow requires JavaScript.

After the highlighting process, the students evaluated which success criteria they had covered in detail, which only touched on, or which they had omitted completely. They then  began a second draft, some using the same adverts as in their first draft and others choosing to to apply what they had learned to new texts. The new drafts are barely recognisable – they are light years ahead of the first versions, and the students are really proud of the progress they have made. I will update this post with some of the improved work in the next week!

My next step is to apply this to my GCSE English class as they complete their next assignment, in a bid to help them to move towards becoming the reflective, self-improving learners that our Dweck and Berger-inspired approach is aiming for.

Colour coded self-assessment – highly recommended!

Implementing Assessment Without Levels

I have blogged twice before about assessment in the new national curriculum (here and here) and looking back at those two posts now it seems high time for an update. We’ve moved on quite some way and we are now implementing our assessment without levels system (or AWOL, as our Head of Science seems intent on calling it) across the school.

Context

We haven’t been using National Curriculum levels for a year or so now, but instead we have been using a system of “Chew Valley Levels” linked to GCSE grades as follows:

levelsgradesnewgcseThis was only ever going to be a stop-gap measure, providing some continuity for students and their families whilst we explored the alternatives. In reality, in the world of comparable cohort outcomes we are not able to say with any certainty what a “C” grade at GCSE is, only what it was last year, and thus tying our levels to this moveable feast rendered them no more reliable than the preceding National Curriculum levels. We have even less idea what students will have to know or be able to do to be awarded the new GCSE 1-9 grades, though Ofqual have published this:

ofqual gradesWhat we know, then, is that roughly the same proportion of students as currently achieve grades A*-C in existing GCSEs will achieve grades 9-4 in the new GCSEs, but that the threshold standard is being raised to grade 5 against international benchmarks. But we still don’t know what students will have to know or be able to do for that grade 5 in any given subject, nor are we likely to as the boundaries will shift year on year, especially in the infancy of the qualifications I suspect.

All of this means two things:

  1. We need to aim higher if we are to get as many students as possible to grade 5 and above – it will be tougher than C and above.
  2. Linking our assessment system to GCSE grades (as was our original plan) is not going to work.

The Threshold Model

Both Shaun Allison and Dan Brinton have been instrumental in clarifying my thinking about the threshold model of assessment. I highly recommend you read Shaun’s Assessment without levels – an opportunity for growth and Dan’s Designing a new post-levels curriculum and assessment model from scratch as they are both superb and you will see that a lot of the ideas in this post are not, in fact, mine, but theirs!

Essentially, in a threshold model, you set up your curriculum with an expectation in terms of content and skills within each unit, year, or key stage – the “threshold”. At each point, you assess students to see the extent to which they have met, exceeded or fallen short of this threshold. The model has the advantage of letting teachers decide the expectations (thus allowing challenge to be built-in) and providing ready-made opportunities for formative assessment and feedback in relation to the threshold expectations. As implied by the title of Shaun’s blogpost, it’s also a system that is compatible with our growth mindset ethos – more of which later.

What to call the threshold? 

This was a tricky point. Initially we considered the Durrington School model using four noun descriptors:

Thresholds from Durrington High School (from http://classteaching.wordpress.com/)

We were then quite taken with the Belmont School model using verbs to describe the thresholds:

Example assessment scheme from Belmont School Science (from http://belmontteach.wordpress.com/)

However, we couldn’t quite agree on language that would fit different subjects appropriately. We considered with the idea of using new GCSE grades 1-9, but they sounded too like the old levels and, in any case, we would have to guess at the standard they represent which is not a sound basis for a new assessment system.

In the end, we decided to use existing letter grades A*-G on the basis that students and their families understand them, and they have a well-understood threshold built in to them at the C grade boundary. Thus students meeting the threshold expectations of our curriculum at each point will be assessed at grade C. Those who exceed it can be graded A*, A or B; those who fall short can be graded D, E, F or G. Each year’s threshold directly correlates to the next year as illustrated in the following table:

AWOL Mapping

In other words, if a student can demonstrate they have met the demands of the Year 8 curriculum by the end of Year 7, they would be graded as B in Year 7. An A* student in Year 7 would be demonstrating the knowledge, skills and processes required of a Key Stage 4 student who had met the C grade threshold.

Mapping the threshold standard onto the curriculum

In our system, for students to have achieved the threshold on entry to the school, they will have to demonstrate that they have met the requirements of the Key Stage 2 National Curriculum in each subject. In time, we will receive this information (at least for English and Maths) from the new Key Stage 2 assessments, but this will not happen until 2016. Therefore we will baseline all students (as we do currently) to assess the extent to which they have met those requirements at the start of Year 7. Any gaps or shortfall will need to be addressed early.

The threshold standard in each year will be decided by the teaching teams within school. It will be informed – but not limited – by the relevant national curriculum requirements, of course, but the guiding principle is that if we are going to value what we assess then we must assess what we value. Therefore the new curriculum that is being designed is based on the key ideas, concepts, knowledge and skills within each subject, informed by the national curriculum, but decided by teachers. At Key Stage 4 the curriculum will, of course, incorporate the examined elements of the KS4 programme of study or examination specification but will not be limited by that. Our best students go beyond and around the specifications anyway – and so should we. If we are going to prepare our students to do well at GCSE we should be teaching them beyond GCSE in Year 11, so the terminal exams they sit seem like a walk in the park in comparison to what they have been doing in the classroom.

Tracking progress in the new assessment system

We already use a flight paths model to track progress in all subjects. Within the new system, tracking progress is even easier.

AWOL Progress

In the example above, a student is assessed at C on baseline and maintains that performance through Year 7 and 8 (the green highlighting). We should expect all students to maintain their performance through the curriculum, but challenge them to improve it. In Year 9, the table above illustrates what happens if a students improves their performance (the orange highlighting), leading to better than expected GCSE outcomes, and what happens if they do not make progress through the year (the red highlighting), leading to an under-performance at GCSE.

At this stage it’s important to be clear about the expectations of this system. No matter where students are on their baseline assessment, it is the job of the curriculum to ensure that as many of them as possible end up above the threshold by the end of each unit, year, or key stage. If a student is a D or an E at the baseline, that is not an excuse to stay on a D or E for the whole of their school career. Rather, it is a challenge to the teacher and the curriculum to fill in the gaps from the previous year as well as teaching the content, skills and processes of the current year. This is the gap that must be closed.

Terminal Assessment

One important element of this system is the introduction of a terminal assessment in each year of the curriculum. This assessment will be in the same style as the GCSE assessment in the relevant subject, and should assess the knowledge, skills and processes of the entire year’s curriculum. Students will have to revise for it, thus preparing them earlier for the demands of the two-year linear assessments in Year 11. We also intend that this approach will improve retention and recall, as curriculum design will be interleaved to incorporate regular revisiting of the key knowledge, skills and processes.

The end of targets? 

Finally, this model moves away from giving each student a target to aim for based on FFT D, CATs predictors, teacher assessment plus some magic dust and a following wind (as described in this early post on my blog – see, David Didau doesn’t have the monopoly on changing his mind!). When I launched our growth mindset ethos, one of the first responses I had was our Head of Geography asking if this was the end of Challenge Grades (our term for student targets). If potential is unknowable, why are we selecting an arbitrary grade and pretending to know it? She was (and is) right, of course. So we will now be judging progress based on where students start and how far they’ve come from that known point, rather than how far they’ve got to go to a point which we cannot and should not pretend to know.

So what now? 

  1. Faculties are meeting on Monday to begin the process of deciding the threshold standard for each year within the curriculum. Some are already there, having piloted systems through last academic year. Others will need to move this year.
  2. We need to finalise what reports to parents will look like in the new system – we have a draft, but it needs thinking through.
  3. We need to troubleshoot the progress measures – if a student moves up a grade in Year 8, does the higher grade become their new baseline or do we continue to measure progress from the start of Year 7 baseline point? What if they drop a grade?
  4. We need to decide when we move over from the legacy system to the new. We only get one chance to get this right – so we need to be sure we have it sorted.

These are the practicalities – but the principles I am certain are right and the system I am sure is workable. I’ll keep you posted!

UPDATE: we are holding an open meeting in January 2015 to share our approach with colleagues. Details here: https://echewcation.wordpress.com/assessment-without-levels/

Assessment in the new national curriculum – next steps

My original post “Assessment in the new national curriculum – what we’re doing” remains one of the most popular on this blog. Here I will outline how we have refined the model proposed in that post and integrated it with progress tracking, as well as our latest thoughts on assessment without levels and growth mindset.

How will we assess in the new national curriculum? 

I was delighted to hear that Durrington High School had been awarded an assessment innovation fund grant by the DfE. I was even more delighted when Durrington DHT Shaun Allison published his thoughts so far in an excellent blogpost! As a school also actively pursuing a growth mindset, the approach to assessment outlined by Shaun struck a chord and seemed closely aligned to what we are trying to achieve at Chew Valley. I presented the key points of the Durrington approach to middle leaders yesterday and we have adopted the principle of the Growth and Thresholds assessment system, explained as follows (paraphrased from Class Teaching):

Teachers identify the key knowledge and skills students need in order to be successful in KS4 and work backwards to decide what this would look like, if students have mastered it in KS3 – the excellence standard. Teachers then produce a curriculum and assessment framework that allows teachers and students to know what they’ve got to do to achieve excellence.  

In the Chew Valley version, we will continue to use GCSE grades as the basis for our assessment model. It makes sense, longer term, to use the new 1-9 GCSE grade scale as a whole-school assessment framework, with rough equivalents as follows:

levelsgradesnewgcse

In other words, students entering in Year 7 would be assessed with grades usually between 1 and 4, and move up a consistent assessment scale throughout their time in secondary school.

We remain wedded to the notion of criteria referenced assessment, although I enjoyed having my thinking pushed on this by Daisy Christodoulou’s provocative defence of norm-referencing. The problem comes with the assumption that there will be clear criteria attached to the new GCSE grades 1-9; my understanding is that there will be criteria attached to the levels and marks within the new GCSE specifications but that they will not be clearly linked to specific GCSE grades. This will allow Ofqual to apply comparable outcomes and shift the boundaries year on year. Thus we will need to assign criteria to the new GCSE grades on a “best fit” basis, leading to some insecurity and uncertainty within the assessment framework, especially in the early stages.

We have not yet decided when we will shift over to 1-9 grades. The existing system will hold up until 2016 at least, and then there will be an incremental shift as first English and Maths, then Science, History, Geography and Languages, then arts subjects move over to the new grades. We also haven’t decided if we’re going to sub-grade them – grade 2c, 2b, 2a anyone? It was a bastardisation of the national curriculum levels; should we be wary of falling into the same trap again? We’re taking a watching brief on both these issues!

Tracking progress in the new assessment framework

With the advent of Progress 8 (blogged about here) we have been running an experiment with progress tracking using flight paths (blogged about here). As indicated in that second blog, in the initial experiment we tracked progress in English and Maths from their respective KS2 baselines, and all other subjects from the average points score of English and Maths at KS2. This worked fine for English and Maths, but it didn’t work for other subjects. I know it seems obvious that tracking progress in Drama from a baseline of the average of tests in English and Maths won’t work, but that is the methodology being applied in the Progress 8 measure so I thought we’d better use it. What I’d got wrong, of course (it’s so easy to do!) was that I’d let the accountability framework dictate my practice rather than common sense and what was right for the learners. So, we’ve made a change.

From September, we will continue to use the KS2 baselines for English and Maths – this is a tried and tested approach and it is giving us clear and helpful data both for individual students and for self-evaluation and external accountability purposes. In all other subjects, we will conduct a baseline assessment in the first term of Year 7 to establish a clear, subject-specific starting point for each student. We will then use that baseline assessment to track progress in each subject across KS3. We will treat the baseline assessment as the “baseline” in the same way as KS2 English and Maths data, even though they will be four or five months apart in time, and apply the flight paths model to each subject in exactly the same way:

Progress flight paths tabulated

Progress flight paths tabulated

We still have the existing template to track progress against an English and Maths KS2 average points score, so I will be able to keep an eye on the Progress 8 headlines, but this refined model will provide the ability to track progress in, for example, Art from their starting point in Art. Which seems obvious, doesn’t it?

In time we will convert the “levels” in those flight paths to the “grades” via the equivalences listed in the table above. It may be that in, for example, languages, the baseline will be very low (where students have not studied that particular language in primary) and this may require the model to be refined – watch this space!

Targets and a growth mindset

When I launched the idea of becoming a growth mindset school back in March, several staff discussed the idea of targets (we call them challenge grades or levels) and whether they were compatible with a growth mindset. Potential, according to Dweck, is limitless – it’s not about aiming for a destination but about constantly continuing to improve. As John Tomsett said in a conversation on twitter recently:

I overheard a conversation between two girls revising for a languages exam this week. They were working on tenses. One said to the other: “I don’t need to know that; that’s what you need to do to get a B. I only need a C.” Her companion was aiming for a B, so continued to revise it. This is why Michael Gove was so against early entry – the wasteful settling for a lower level of achievement. This is the danger of target grades – if students work hard and get there, they stop. And, unless that target grade is an A* (and even then), that is a waste.

This is a substantial shift in my thinking (see one of the earliest posts on this blog, Targets, for my starting point!), but actually the flight paths approach provides us with a different way to frame the conversation about progress. In the old model I would use formulae and statistical cohort analysis tools like CATs, FFT and the like to predict likely outcomes and “add a bit on for challenge”, then track and discuss progress towards that made up number. It makes more sense to me now to assess where students are starting from and then feed back whether their progress is below, expected, better than expected, outstanding or world class from that starting point (using the flight paths model). Thus reports to parents might say “Matilda is currently working at a Grade 3 in Science, and this represents better than expected progress from her starting point in this subject”. At the moment this is a tentative, half formed policy shift which will need to be put through the crucible of SLT and Governors – what better way to try it out than to put it to the test on twitter first?

In summary

The abolition of national curriculum levels remains an opportunity to do something different and better with curriculum and assessment across the whole of a student’s school experience. The fact that each individual school is having to come up with its own system remains a fatal flaw in terms of capacity. The new assessment innovation packages may go some way to preventing this – especially if they are of the quality of the work coming out of Durrington. Whilst there is still a lot of work to do, and a lot of uncertainty, it is still my aim that assessment and curriculum in my school will be the better for the reforms.

Tracking progress over time: flight paths and matrices

Everyone should already be familiar with the KS2-4 Transition Matrices. A staple of RAISEonline, they were the first thing our HMI asked me for in our last Ofsted inspection and form the staple diet of inspectors judging the impact of a secondary school on progress in English and Maths.

Framework for KS2-4 Transition Matrices

Framework for KS2-4 Transition Matrices

And quite right too. It’s common for secondary teachers to bemoan the inaccuracy of KS2 levels, but like it or not, somehow those students got those levels in Year 6 and we need to add value during their time with us. Of course, the starting point (KS2 levels) and the end point (GCSE grades) are both in flux for the next few years, which renders the measurements somewhat uncertain (see my blog KS2, KS4, Level 6 and Progress 8 – who do we appreciate?), but the principle of measuring student performance on entry and exit to judge progress makes sense.

Over the past year we have been experimenting with progress flight paths which I found initially on Stephen Tierney‘s @LeadingLearner blog. We are now using transition matrices based on our own version of progress flight paths to track progress in each year group and identify students who are at risk of not progressing over time. In this post I will outline the methodology we use; I’m happy to answer any questions in the comments or via my “contact me” page.

But we don’t have National Curriculum levels any more…

No, that’s true – and we don’t use them. As outlined in my post Assessment in the new national curriculum: what we’re doing, we have adapted our assessment criteria at KS3 to reflect GCSE criteria. All our language in reporting to parents and policy statements now refers to “Chew Valley Levels” to clarify our position. This way, we preserve some continuity for students and parents who are used to the levels system, but we create a consistent ladder of knowledge and skills to assess from Year 7 to Year 11. As GCSE grades change to numbers, we may well consider adjusting to a numerical assessment system across the school too, but maintaining the principle of a five-year continuous assessment scheme in each subject.

The flight paths

The flight paths we are using, based on the @LeadingLearner model, are set up as follows:

  • Expected progress: one sub-level of progress in each year
  • Better than expected progress: one and a half sub-levels per year
  • Outstanding progress: two sub-levels per year
  • World class progress: more than two sub-levels per year
Progress flight paths tabulated

Progress flight paths tabulated

The flight paths do not presuppose that progress over time is linear; this was my initial misunderstanding of the model. Rather, they show the trajectory of progress over time within which students need to perform if they are reach or exceed the end of KS4 destinations outlined in RAISEonline. Creating marker points at the end of each year enables early identification of potential issues with progress. At Chew Valley we collect assessments three times in each academic year, all measured against the flight paths. At the first assessment point, only one short term into the year, a greater proportion of students might be lower on the flight paths, but over the course of the academic year teachers can focus their planning to ensure that those students who are at risk of falling behind have any issues addressed.

Creating transition matrices from the flight paths

Using SIMS tracking grids, we have created transition matrices for each year within the curriculum. These can be populated with student names at each assessment point, and generated for teaching groups, gender groups, pupil premium cohort, or any other field within the SIMS dataset. Simply put, students are plotted in the grid with the row representing their KS2 prior attainment level and the column representing their current performance assessment. We will be able to adapt the row and column headings as the assessment systems change.

Example tracking grid template in SIMS

Example tracking grid template in SIMS

Within the template, the fields are colour-coded to represent each of the flight paths:

  • White = below expected
  • Green = expected progress
  • Blue = better than expected
  • Pink = outstanding
  • Yellow = world class

Once populated, the matrices are distributed to curriculum and pastoral leaders and, critically, class teachers. They enable at-a-glance identification of progress issues on an individual, cohort, prior-attainment bracket or group scale.

Example of a populated tracking grid with student names anonymised. Note the tabs across the bottom for teaching groups and subgroups

Example of a populated Y8 tracking grid with student names anonymised. Note the tabs across the bottom for teaching groups and subgroups.

When I was a Head of English, this is the data I would have wanted my SLT to be providing me with. As with all data work in my leadership role, I am trying to adhere to the principles I outlined in my post The Narrative in the Numbers, and to make the data as useful as possible to enable teachers in the classroom to do their job even better. By clicking on my class tab along the bottom of the spreadsheet I will be able to see at-a-glance which students in my group are progressing well, and which less well; then I will be able to plan what I’m going to do about it over the next few terms.

Transferability 

Currently this method is only applied to English and Maths. We have experimented with using an average KS2 points score to create a generic baseline and applying it to other subjects, but it throws up too many anomalies to be reliable or useful (which poses some interesting questions about the proposed Progress 8 methodology). However, it would be possible to apply this model from a Year 7 baseline assessment in any subject – the tools are there.

 

Closing the Gap Marking – Twilight CPD

As part of our twilight INSET programme this year I am delivering a CPD session on marking. It’s a great opportunity to bring together lots of ideas from lots of superb bloggers, teachers and thinkers – it’s been quite difficult to condense everything down! Here is the Prezi I’m using in the session (click this link if you can’t see the embed):

I have also adapted this session for Pedagoo South West and a 45 minute version of the 90 minute session can be found by clicking this link, along with the video of the session on YouTube.

The aims of the session are to improve the effectiveness of marking without spending more time on it. This will be done by looking at:

  • Public Critique (via Tait Coles here)
  • Triple Impact Marking  (via David Didau here)
  • DIRT (via Alex Quigley here)

Why are we looking at marking? Because…well, I’ll let Phil Beadle take this one:

beadle

I chose that photo on purpose.

The key thing to first is identify the gap that we’re trying to close. Fortunately, Tom Sherrington already has this covered in his Making Feedback Count blog:

gap

Graphic adapted from @headguruteacher

It’s the gap between students receiving the feedback and acting on it that we need to address. There is no better example of this process in action that Austin’s Butterfly, also blogged about by Tom here, and demonstrated by Ron Berger himself here:

Nowhere is the power of feedback on performance better demonstrated than in this example! Our feedback needs to be:

  • Specific
  • Hard on the content
  • Supportive of the person

And by “our”, of course I mean peer and teacher feedback, since Berger’s example is primarily focused on teacher-mediated peer feedback.

To demonstrate this, I ask colleagues to undertake a public critique exercise (inspired in part by the Alan Partridge clip used by Tait Coles at TeachMeet Clevedon). I ask staff to produce something to a set of criteria – a haiku, in the Prezi example – and submit it for public critique using Tait Coles’ critique sheets. I have adapted them so that there is space at the top measured for post-it notes to fit into – because I’m obsessive like that. You can download the Public Critique Sheet here.

Following reflection on public critique and applications in practice, we move on to Triple Impact Marking. This idea comes from David Didau and is captured in this presentation from his blog:

A key component of Triple Impact Marking is DIRT – Dedicated Improvement and Reflection Time. Alex Quigley explains the concept in detail (with links) here, but essentially students need TIME to act on the feedback given. This is where the gap is closed. I have been as guilty as any teacher of handing back meticulously marked books, asked my class to read the comments, and then got on with the next bit of the course. What. A. Waste. Well no more – we’re getting DIRTy.

To conclude our look at feedback, who better than Dylan Wiliam (via Mark Miller here):


This emphasises the importance of creating a successful feedback culture to enable a growth mindset. No grades. No levels. Specific targeted feedback, hard on the content, soft on the person.

To conclude the session, an exercise looking at managing marking workload. Many of these ideas come from another excellent Mark Miller blog, found here. There are twelve strategies and staff note down the advantages and disadvantages of each strategy in terms of learning and performance gains and workload implications. The idea is to evaluate each strategy in terms of its overall cost benefit to the busy classroom professional.

Twelve Tips and Tricks for marking and feedback

Twelve Tips and Tricks for marking and feedback

As a takeaway I’ve also adapted the sheet that Tom Sherrington blogged from Saffron Walden High School – you can download the Student engagement with written feedback sheet which can be seen here:
Increase marking impact
What has become clear to me in planning this inset is how rich my personal learning network is. The blogosphere is teeming with great ideas about marking, feedback and critique – all I had to do was synthesise the great work of others and stitch it into a package that will fit into 90 minutes of a dark, January evening. I hope it will go well!

KS2, KS4, Level 6 and Progress 8 – who do we appreciate?

In 2016, secondary schools will be held accountable to a new set of measures including Progress 8 and Attainment 8. These measures were announced in October and I have been reflecting on the implications for schools. In response to the consultation I was broadly in favour of making schools accountable for the progress of all students rather than how many we can push through the C/D boundary. I think it is a real step forward that schools will be accountable for turning Es to Ds and As to A*s as well as Ds to Cs. However, there are a few issues that worry me.

The English and Maths Key Stage 2 baseline applied to all subjects

I have no particular issues with the Key Stage 2 tests in Maths and English; I don’t really have enough specific knowledge of them to criticise. Clearly it is the only baseline we have to measure progress from KS2 to 4. However, just because it’s the only one doesn’t necessarily make it right. I’m sure statistically there must be some basis to show that progress from this baseline to a GCSE result in, say, Art or PE makes sense with a national dataset. But I found it hard to convince PE teachers that measuring progress in PE against an English and Maths baseline was a fair, right and just evaluation of their performance, even if that GCSE only formed one tenth of the best 8 (since English and Maths count double).

The Key Stage 2 baseline itself

I know that primary school colleagues have a hugely detailed and thorough knowledge of their pupils’ abilities across the curriculum, and especially in Maths and English. I’m sure it far outstrips the accuracy of a secondary school teacher’s assessment if only by dint of contact hours and therefore assessment opportunities. However, the Key Stage 2 tests are the primary accountability measure by which primary schools are judged and it is therefore in their interests to ensure that pupils achieve as highly as they can in those tests. I know that this can lead to coaching for the tests in exactly the same way as Key Stage 4 teachers focus the majority of what is taught in Year 11 on what is on the exam – you’d be mad not to. It is therefore likely that a proportion of the results achieved do not reflect secure performance at that National Curriculum Level, and that secondary schools need to ensure they are secure before they can progress. What is supremely ironic is that primary schools have already adopted a new National Curriculum without levels, although their accountability is dependent on tests with levelled outcomes. In turn, secondary accountability is dependent on progress from those levels.

Instability of the assessments on which Progress 8 is based

As shown in the above tweet, those anachronistic levelled tests are will be phased out after 2015 to be replaced by an as-yet-unannounced new test based on…something else. At the same time GCSE grades will be replaced by the numerical 1-8 system, meaning that in the first five years of Progress 8 we will be measuring progress from an untested baseline to a new and untested end point.

It’s all very well for the consultation response to state: “pupils with a point score of 29 on their Key stage 2 tests achieve, on average, 8 C grades at GCSE” but, when the measure is introduced, neither the baseline nor the end point will exist in this form.

Level 6 Progress Inflation

Is grade inflation spreading?

Is grade inflation spreading?

At the same time as all this is going on, the ability to award Level 6 has now been introduced at Key Stage 2. You would be a rare Year 6 teacher or primary school leader indeed not to want to get as may students to that new level as possible. However, how many of those level 6 successes will be pushed up before they are fully secure at Level 5? We have had a handful of students arriving in Year 7 over the past few years teacher assessed at level 6. Suddenly this year we are expecting dozens. This is not a significantly brighter cohort but the expectations for the expected three levels of progress KS2-4 will be massively different.

What the accountability measures actually measure

Assessments are a house of cards

Assessments are a house of cards

Tom Sherrington (@headguruteacher) explains in The Data Delusion how the assessment regimes on which we depend for accountability are a house of cards with very little direct relation to what a student has actually learned. David Didau (@learningspy) explains that what our assessments actually measure is performance, not learning in The Problem with Progress. Invariably accountability will drive curriculum and teaching and learning decisions but I worry that their foundations are so insecure that pedagogy and learning may be lost in the confusion.

Clarity

It may of course be that I’ve completely misunderstood aspects of Progress 8, or missed something completely obvious. I’d welcome any corrections, clarifications and reactions in the comments below, or on twitter.

The end of coursework

or…What’s assessment for anyway? 

When I took my GCSEs in English and English Literature (in 1991) they were 100% coursework. I wasn’t alone; according to the 2006 Review of GCSE Coursework from QCA (found here) about two-thirds of 16 year olds in the early 1990s were taking GCSE English through syllabuses that had no examinations. Much has changed since then, and all 16 year olds who take GCSE English in summer 2017 will do so following syllabuses with 100% terminal examinations (as announced by Ofqual).

A mindset change

Coursework has been part of my Key Stage 4 experience as a student, trainee, teacher, Head of Department and Senior Leader. Its removal requires a complete shift of mindset. Curriculum design, long and medium term planning in English has always been about fitting the coursework (or latterly controlled assessment tasks) into the two years to form a coherent programme of study around the assessment tasks. No longer. At this point in time, this feels like a blessed relief from the millstone of controlled assessments, and an opportunity to open up curriculum time to learning, but it will feel very different.

A change of gear is needed

A change of gear is needed

It will also require a mindset change for students. I have felt uncomfortable for some time about the prevalent attitude of “will it count towards my GCSE?” amongst students I teach. The unfortunate truth at the moment is that if it does, most will really try and put in every effort. If it’s “just practice” or, heaven forbid, an assignment merely to develop or secure understanding, it doesn’t get the full focus of a “proper assessment”. I will be glad to see the back of this distinction as it will allow and require a full focus on the process of learning in every piece of work throughout the course.

Teacher assessment is best

I genuinely believe that teachers are best placed to make accurate and complete assessments of their students’ abilities. It seems almost ridiculous that I have to state that at all. Teachers spend every lesson with their students and know better than anyone the full range of their achievements within the subject, in much more detail than any examination can hope to discover, no matter how long or rigorous. This will be lost in the terminal exam system. Teacher assessment (in English especially) has snapped under the weight of the accountability framework’s focus upon it. This was recognised in the QCA GCSE_coursework report:

5.44: The environment for GCSE and A levels has changed. Twenty years ago there were no achievement and attainment tables (formerly performance tables), no national or local targets related to examination grades and no link between teachers’ pay and students’ results. The environment now is far more pressured and in these circumstances, it is likely that internal assessment of GCSE and A levels as presently practised has become a less valid form of assessment.

Teacher assessment + high stakes accountability = a powder keg

Teacher assessment + high stakes accountability = a powder keg

This is undoubtedly the case. Teacher assessment is still the best way of assessing student progress and learning (although, as David Didau asserts, measuring learning is a horrifically complex business). It should still be the basis of teaching and learning in the classroom but only if the sole purpose of that teacher assessment is to measure the child’s progress and identify next steps in learning. If the teacher assessment is also serving the purpose of proving progress to senior leaders and external inspectors in order to maintain the school’s standing in performance tables and the teacher’s own salary, then of course there are vested interests at play which will encourage even the most professional professional to err on the side of generosity. And this is how we’ve arrived at our current situation. The accountability and pay systems have rendered the most accurate and helpful form of assessment unreliable and corrupt. Excellent work, policy makers.

Moving forward

I have several tasks as a school leader now to make the most of this new assessment framework.

Jumping through hoops - a necessary evil?

Jumping through hoops – a necessary evil?

  1. To help subject teams re-think curriculum design away from the coursework/controlled assessment structures that have been in place for so long. We will have a lot to learn from Maths and other 100% examined subjects here; we will need to make the most of the time freed up from controlled assessments to teach curriculum content (which is a combination of knowledge and skills, of course).
  2. To decouple teacher assessment from external accountability and pay progression as far as possible, to allow it to be carried out accurately for the benefit of the student’s learning, parents, and teachers themselves to inform planning.
  3. To work with all teachers and students to jump the hoops of the new terminal exams. I hate this part of the job, but recognise that teaching exam technique is vital to success in exams. I will also make every effort to keep this in proportion to the real business of teaching the actual subjects.
  4. To continue to do my best to construct a Key Stage 4 curriculum in the best interests of the learners at my school.

I’ll let you know how I get on.