The Trouble with NAPLAN1

Nicole Mockler

In May this year, the NSW Minister for Education, Rob Stokes, called for the “urgent dumping” of the National Assessment Program – Literacy and Numeracy (NAPLAN) (Baker, 2018). The surprise call was rejected by the then Federal Minister for Education, Simon Birmingham, who argued that “parents like it”. One month later, the Education Council, comprised of all Ministers of Education from Australian states and territories, ordered a review of NAPLAN data presentation, and the publication of the 2018 NAPLAN data on the My School website has been delayed until after the presentation of an interim review report to the Council.

A look at social media at the time of the release of results each year suggests that Minister Birmingham was right in his assessment, with many parents claiming that NAPLAN scores are one of the few precise indications they get of their children’s performance at school. And of course, it’s entirely understandable that parents seek a good, clear indication of their children’s progress in their learning. But the question remains as to whether NAPLAN is the best way to achieve this.

So, what’s the trouble with NAPLAN? First, there’s the question of accuracy. How ‘precise’ is the tool, really? While communication of results to parents suggests a very high level of precision, the technical report issued by ACARA each year (ACARA 2018) suggests something quite different. Margaret Wu, a world-leading expert in educational measurement and statistics, has done excellent sustained work over a number of years on what national testing data can and cannot tell us (see, for example, Wu, 2010, 2016). Her work demonstrates that while parents are provided with an indication of their child’s performance that looks very precise, the real story is quite different. The NAPLAN tests ask a relatively small number of questions in each section, sections that are then used to estimate a child’s performance for each (very large) assessment area. This leads to a lot of what statisticians call ‘measurement error’. By way of illustration, Figure A is based on performance on the 2016 Year 7 Grammar and Punctuation test: in this case, the student has achieved a score of 615, placing them in the middle of Band 8. We can see that on this basis, we might conclude that they are performing above their school average of about 590 and well above the national average of 540. Furthermore, the student's performance appears to be just in the top 20% (represented by the unshaded area) of students nationally.

Figure A

However, Figure B tells a different story. Here we have the same result, with the ‘error bars’ added (using the figures provided in the 2016 NAPLAN Technical Report, and a 90% Confidence Interval consistent with the My School website). The solid error bars on Figure B indicate that while the student has received a score of 615 on this particular test, we can be 90% confident that their true ability in grammar and punctuation lies somewhere between 558 and 672, about two bands’ worth. If we were to use a 95% confidence interval, which is the standard in educational statistics, the span would be even wider, from 547 to 683 (shown by the dotted error bars). In other words, the student might be very close to the national average, toward the bottom of Band 7, or quite close to the top of Band 9. That’s quite a wide ‘window’ indeed.

Figure B

So the ‘precision of individual results’ argument doesn’t really hold. Any teacher worth their salt, especially one who hadn’t felt the pressure to engage in weeks of NAPLAN preparation with their students, would be far more precise than this in assessing their students’ ability based on the substantial evidence they collect in class. In the words of the Secretary of the NSW Department of Education, Mark Scott, it is “these smaller tests, these regular ongoing assessments that take place by teachers in classrooms to monitor progress” that make for “good assessment" (Robinson 2018).

Wu also notes that NAPLAN is not very good at representing student ability at the class or school level because of what statisticians call ‘sampling error’. The sampling error in NAPLAN results goes down as the cohort size goes up – for example, while the margin of error (at the 90% confidence interval reported on the My School website) for a school with a large cohort of approximately 180 students might be only 10 points, for a school with a far smaller cohort of only 40 students, the margin of error on the same test might be around 50 points.2 The problem is that this representation of student performance on a school level is what the MySchool website is built on, through which Australian parents are encouraged to choose a school for their child. Research also suggests that the NAPLAN/My School nexus has played a driving role in Australian teachers and students experiencing NAPLAN as ‘high stakes’ (see, for example, Dulfer, Polesel & Rice, 2012; Gannon 2012; Hardy 2017).

At the national level, however, the story is different. What NAPLAN is good for, and indeed what it was originally designed for, is providing a national snapshot of student performance, and conducting comparisons between different groups on a national level (for example, students with a language background other than English and students from English-speaking backgrounds). There are, however, other ways to achieve this. Rather than testing every student in every school, a rigorous sampling method would be a lot more cost effective, both financially and in terms of other costs to our education system, and much easier on students, parents and teachers.

So, does NAPLAN need to be dumped? To my mind, the answer to that is both yes and no. Our current use of NAPLAN data does need to be urgently dumped. We need to start using NAPLAN results for, and only for, the purpose for which they are fit. At the very least, we need to get NAPLAN results off the My School website, cut out the hype and anxiety about the tests and start being honest with parents about what NAPLAN tells them and does not tell them about their children’s learning.

The Review of NAPLAN Data Presentation agreed to by the Education Council at its 2018 June meeting may go some way toward such action, if it lives up to its own terms of reference, which are presented below.

The review will inform the Education Council about:

  • Current presentation on My School of school, system, sector and jurisdiction performance data, in the context of the initial (2009) principles and protocols for reporting on schooling:
    • Principle 1: Reporting should be in the broad public interest.
    • Principle 2: Reporting on the outcomes of schooling should use data that is valid, reliable and contextualised.
    • Principle 3: Reporting should be sufficiently comprehensive to enable proper interpretation and understanding of the information.
    • Principle 4: Reporting should involve balancing the community’s right to know with the need to avoid the misinterpretation or misuse of the information.
  • The extent to which current presentation of data to schools and their communities supports their understanding of student progress and achievement.
  • perceptions of NAPLAN reporting and My School data and the extent to which they meet reasonable public accountability and transparency expectations and requirements, including considering any misinterpretation and misuse of information and subsequent consequences.
  • how teachers and school leaders use NAPLAN and its results and My School data to inform teaching practice.
  • how teachers and school leaders communicate NAPLAN results and My School data to students and parents.
  • international best practice for teacher, school and system level transparency and accountability.

(Education Council, 2018, my emphasis)

At present, it could be claimed that the information presented to parents, as highlighted above, does not ‘enable proper interpretation and understanding of the information’. Some of my own current research, conducted with Dr Meghan Stacey of the University of Sydney as part of the Teachers, Educational Data and Evidence-informed Practice3 (TEDEP) project, suggests that teachers too struggle with both the meaning and utility of NAPLAN data and how best to use it to inform their teaching practice. While it seems that the process for the review is yet to be announced, according to the Education Council (2018), it will involve consultation “with parents, teachers, students, school leaders, peak bodies and independent experts as appropriate, as well as government and non-government education authorities”, so there should be an opportunity for all of us to join in the conversation about what happens to and with NAPLAN data.

At the same time, and this is very much the focus of our current project, we need to open up a broader conversation about what constitutes good evidence of teaching and learning. Such evidence is not necessarily generated ‘out there’ through external testing; nor must it rely on the existence of tools generated by ‘Big 5’ consulting firms, despite the enthusiasm about this in the wake of the publication of the Through Growth to Achievement report (Gonski et. al. 2018). Furthermore, valid and reliable evidence in education generated at the local level relies on strong teacher professional judgement. It’s in all our interests that we have a teaching profession with robust, and well-honed professional judgement, and that we trust teachers to get on with the job that the vast majority of them do so well for relatively little return. Teachers have a good sense of what such evidence looks like, and as part of the TEDEP project, they’re telling us. For example, from three of the questionnaire participants in our study:

I know I'm teaching well based on how well my students synthesise their knowledge and readily apply it in different contexts. Also by the quality of their questions they ask me and each other in class. They come prepared to debate. Also when they help each other and are not afraid to take risks. When they send me essays and ideas they might be thinking about. Essentially I know I'm teaching well because the relationship is positive and students can articulate what they're doing, why they're doing it and can also show they understand, by teaching their peers. (102)

Pre and post testing informs whether you have made an impact in learning. The data comparison will inform how you modify practice to achieve learning gain. (124)

I am working on formative assessment, especially trying to build skills with repeated use, student self-assessment and application of feedback. I feel I am teaching well when there is genuine thinking and problem solving in the room and students are learning from each other as well as the teacher. It's the vibe. (130)

While these are excerpts of responses from only three participants, it is clear from the hundreds of responses we have received that the kinds of evidence teachers collect, and that they value, are complex, diverse, and gathered continuously. It’s also clear that these forms of evidence are closely tied to the actual work of teachers in classrooms, and need to be: you can’t capture ‘the vibe’ through a national standardised census test.

Recognising, valuing, and working to understand and develop these local-level knowledge production processes inherent in teachers’ everyday work is the important next step in the assessment debate. In the process of taking this step, perhaps we might free up some classroom time for more productive things than test preparation.


Thanks to Meghan Stacey, James Ladwig, Helen Proctor and Elenie Poulos for their helpful comments and suggestions on earlier drafts of this piece.


Australian Curriculum, Assessment and Reporting Authority (ACARA) (2018). National Reports. Available at https://www.nap.edu.au/results-and-reports/national-reports

Baker, J. (2018). NAPLAN is being used, abused and must be urgently dumped: Stokes. Sydney Morning Herald, 3 May, p.1.

Education Council (2018). Communique, 22 June 2018.

Gonski, D., Arcus, T., Boston, K., Gould, V., Johnson, W., O’Brien, L., Perry, L. and Roberts, M. (2018). Through Growth to Achievement: Report of the Review to Achieve Educational Excellence in Australian Schools.

Karp, P. (2018). NAPLAN: NSW government call to scrap tests rejected by Simon Birmingham. The Guardian, 4 May. Available at: https://www.theguardian.com/australia-news/2018/may/04/nsw-governments-call-to-scrap-naplan-rejected-by-simon-birmingham

Robinson, N. (2018) NAPLAN 'will look a little dated' when new testing becomes widespread, Mark Scott says. ABC News Online, 29 May. Available at: http://www.abc.net.au/news/2018-05-29/naplan-will-look-a-little-dated-when-new-testing-catches-on/9796860.

Wu M. (2010) The inappropriate use of NAPLAN data. Professional Voice 8: 21-25.

Wu M. (2016) What can national testing data tell us? In: Lingard B, Thompson G and Sellar S (eds) National Testing in Schools: An Australian Assessment. Abingdon: Routledge, 19-29.


  1. This article is based on a post on the Australian Association for Research in Education (AARE) Blog, published in May 2018. The AARE Blog is designed as a conduit between educational researchers and teachers, system leaders, policy makers and the general public, and is available at www.aare.edu.au/blog/
  2. Using as the example here the performance of two differently sized public primary schools (Russell Lea and Chatswood Primary Schools, said to be statistically similar on the My School Website) on the Year 3 2017 NAPLAN Numeracy test.
  3. More information available here https://www.nicolemockler.com/about 


Nicole Mockler is Associate Professor of Education at the Sydney School of Education and Social Work at the University of Sydney.  She is a former teacher and school leader, and her research and writing primarily focuses on education policy and politics and teacher professional identity and learning.  Her recent co-authored books include Questioning the Language of improvement and reform in education: Reclaiming meaning (Routledge, 2018) and Education, Change and Society (Oxford University Press, 2017). Nicole is Editor in Chief of The Australian Educational Researcher.

This article appears in Professional Voice 12.3 Personalised learning, inclusion and equity.