Professional Development is widespread in education, perhaps more so than in most other fields. In the United States, it is incorporated into almost all teaching contracts and teachers may participate in multiple PD programs every year. But despite our widespread reliance on professional development, Professional Learning is less widespread.
So what is the problem? Perhaps the biggest problem is that we lack a clear knowledge about how PD programs work. Though hundreds of articles are written every year about professional development, most are advocacy pieces, lacking hard evidence. Over the past few years, I have examined this literature and found only 27 articles that provided sound evidence about the programs they studied. By “sound,” I mean they measured student learning after the program was finished and they compared these teachers’ classrooms to comparable classrooms elsewhere. In this article, I describe what these studies tell us.
Not surprisingly, the programs themselves were quite various, and so were their outcomes. So I wanted to learn which differences made a difference to teachers. For instance, one difference was that they addressed different topics. Some focused on subject matter knowledge, some on classroom management, some on methods of fostering discussion, and so forth. Another difference is that they used quite different pedagogies. Some simply presented their ideas in lectures or handbooks while others demonstrated specific practices and still others engaged teachers in analysis of teaching artifacts such as videotapes of lessons or samples of student work.
The first thing I learned was that the various topics they taught were roughly equal in their benefits to teachers. So we know that teachers can benefit from learning more about almost any topic. This should not be surprising because teaching is a multi-faceted activity and draws on knowledge about lots of different things, including things like how to use software, how to fix electrical outlets, how students learn, what motivates students, how to organize group work, or more about geography, history, reading, mathematics, etc.
But the second thing I learned was that the differences in pedagogy made a big difference to teachers. Some pedagogies fostered more learning and more changes in practice than others did. Of course, to figure this out, I had to sort the programs into some sort of categories. Here are the three categories of pedagogy formed:
- Presentation. This is the oldest and most widely-recognized method of sharing new ideas. We write them down in manuals, handbooks or textbooks, we present them in lectures with intermittent question-and-answer sessions or small group discussions. For example, one program focused on research-based methods for teaching reading. The presenters identified five primary findings from research and then provided teachers with five seminars, distributed across the school year. Each seminar was also accompanied with a text, or manual, that laid out all the details about that particular approach to teaching reading.
- Prescription. The second group translated their ideas into a set of practices that they could not only describe but also show. This group of programs often relied on classroom observation instruments to describe the specific practices they felt teachers should use. They also used these instruments to observe teachers and give them feedback on how, or how well, they were implementing the recommended practices.
- Situated Problem Solving. In the third approach, teachers meet in small groups to discuss and analyze artifacts of teaching such as videotaped lessons, examples of student work, or teachers’ own lesson plans. The discussion leader would raise questions and offer new insights to help teachers make sense of the things they were examining, including curriculum units, lesson plans, student behaviours, etc.
Don’t forget that I reviewed 27 studies, so I actually have multiple tests of each of these pedagogical approaches. And the studies themselves are also quite various. For instance, within one of these three approaches, I might find one study focusing on 40 secondary science teachers, another focusing on 600 first grade teachers, and yet another aimed at middle school bilingual teachers. So the topics and grade levels can be quite various even though they all rely on a common pedagogy to achieve their goals. As I said above, the topics seemed equally useful but the pedagogies had visible differences in their effects. Specifically, the third pedagogy, which consisted of situated problem solving, was the only pedagogy that actually helped teachers learn and change their practices.
I have a graph here that illustrates the differences among these programs, though it is a bit hard to read. Each “bubble” in the chart represents a unique study of a unique PD program. The actual effectiveness of the program is reflected by how high the bubble is on the graph. Bubbles at the bottom had zero benefit for teachers. Even after participating in these programs, teachers were no more effective than other teachers who did not participate. The higher the bubble, the more different participating teachers were from other comparable teachers who did not participate. Other aspects of the bubbles reflect the program size and the amount of time they spent with teachers, as explained beneath the figure.
Figure 1: Effectiveness of different approaches to Professional Development
- Each bubble represents a particular program effect in a particular year. Bubbles clustered together with labels like Y1 and Y2 represent the same program across different years.
- Size of bubbles represents the number of people in the program. Larger bubbles mean larger groups of teachers participated.
- The darkness of bubbles represents the number of hours spent by teachers in the program. The darker the bubble the more contact hours. e.g. The year 2 white bubbles means that the evaluation of the effects of the programs on participating teachers continued in the following year after the program had stopped.
- The position on the chart represents how effective the programs were. Bubbles located in the upper half of the picture indicate that their programs were more effective than those at the bottom.
- The three divisions of the chart represent the different forms of pedagogy used in the programs ie prescribe procedures, convey body of knowledge, foster situational analysis.
Here are some patterns that I see in this graph, along with some hypotheses for why these patterns appear.
1. Pedagogy Matters
I sorted the full set of program bubbles according to their reliance on the the three pedagogies I described above. The group on the left depicts results of programs that focused on procedures, and you can see that nearly every bubble rests on the baseline, which represents zero difference between participating teachers and their comparison teachers. In the middle of the figure are programs that presented new knowledge. Their bubbles are a little higher on the chart. Finally, on the right are bubbles representing programs that engage teachers in group discussion and analysis of teaching situations.
This third group has the highest bubbles even though their methods and content were quite various. Some analyzed student behavior in videotapes, some analyzed student homework, some analyzed interactions between teachers and students, and so forth.
I have a hypothesis about this. I suspect that the reason the two less effective pedagogies are less effective is that they try to give universal solutions to a practice that necessarily depends on unique situations. It is one thing to say that, as a general rule, students are more motivated when they have choices, but it is another to say that Donald, third seat on the left, will be more motivated or will make constructive choices if he is provided with them.
Everything teachers do depends on the situation. The question you pose to motivate a discussion depends on the topic and depends on who your students are, and sometimes even on whether the lesson is before or after lunch. Many decisions are influenced by one particularly difficult student. Each lesson presents a new and unique combination of content, materials, students, time constraints, school policies and so forth, and the decisions teachers make are based on simultaneous consideration of all of these things. This is why universal solutions don’t work.
Professional developers who fail to acknowledge the “situatedness” of teaching fail to provide teachers with usable knowledge. The knowledge shared in manuals and lectures is essentially context-free. It is not linked to anything and thus easy to forget when teachers return to the unique situations they face in their classrooms. The universal practices that coaches often advocate are embedded in practice, but they are presented as if they were universally applicable and cannot be amended to adjust to specific circumstances. It may be true that, in general, a particular procedure is valuable, but it does not follow that that will always be effective.
2. Learning takes time
There is another pattern in my graph that is worth thinking about. Notice that there are some bubbles that say “Y2” in them, and that these are always linked to other bubbles that say “Y1.” The Y2 bubble represents the second year of the study. Often these bubbles are white, meaning there was no program activity during the second year, but that researchers followed teachers for a second year to see if they retained what they had learned during the first year. In most cases, teachers not only retained what they learned, but improved their practice during the Y2 year.
I have a hypothesis for this as well, and it also has to do with the complexity and “situatedness” of teaching. Teaching is a practice that is improved by increments. Teachers are constantly adjusting and adapting, tweaking and tinkering with their classroom systems. Even when they gain useful ideas from a PD program, it still takes time to translate it into their own unique situations. Thus, even if their practice improves during the first year of the program, it is very common for it to improve even further the next year, as teachers continue to discover new ways of incorporating these ideas into their own situations.
3. More contact time does not necessarily mean more learning.
When bubbles in my figure are darker, it means that the program spent more time, or more “contact hours” with teachers. Conversely, lighter bubbles mean fewer hours together. A common assumption among teachers and researchers is that the more time we spend on a unit, the more students will learn from it. But this figure suggests that the relationship between time and learning is not very clear. In fact, there is a slight tendency for the lighter bubbles to rise above the darker bubbles. However, it is not clear that contact hours per se are responsible for the position of the bubbles, because the height of these bubbles may reflect the differences in pedagogy more than differences in contact hours per se.
4. Smaller programs seem to be more beneficial
The size of the bubbles in my figure reflects the size of the programs. Larger bubbles may have served a hundred teachers or more, and a smaller one may have included only 20 or 30. Notice that there is a slight tendency for smaller bubbles to rise higher than larger bubbles, meaning that programs serving smaller groups of teachers might be more effective than very large programs. This pattern might also be tied to the pedagogies that programs are using, for programs providing universal solutions have standardized messages and so are easier to share with larger populations. Programs that aim to engage teachers in analyses of real situations tend to work with smaller, more intimate groups of teachers.
If you are a professional developer, these findings might seem disappointing. They suggest that very few programs actually make a difference, and that those that did make a difference had to be of limited size in order to establish intimate learning communities. But if you are a teacher, these findings are optimistic, for they suggest that small, local groups of teachers who meet regularly within their own schools to analyze their own situations might be able to gain new insights and make valuable improvements without great expense and without depending on external supports. In the Appendix to this paper, I include short descriptions of some of these programs, focusing especially on what groups do when they share and compare, in the hope that readers might find useful self-help ideas here.
Appendix: Illustrative Programs in Each Group
The Science Immersion program was designed to introduce a new science curriculum to teachers in Los Angeles. Though the curriculum was intended to encourage exploration among children, its instructions for teachers were highly prescriptive and detailed. A manual for a single unit could be as long as 200 pages. Here is a sampling of instructions for a single lesson:
- To set the tone for this investigation as an exploration, generate a class discussion and class list about what plants need for growth and development.
- Use the Think Aloud technique to model how to refine a wondering into a good scientific investigation. From the students’ list about what plants need, form the question—What effect does sunlight have on radish plant growth and development?
- Continue the Think Aloud to model assembling the Terraqua Columns using proper experimental procedures, and designing an experiment that has only one factor that is varied.
- Have students record and explain their predictions for each set of columns for later reference.
- … (p. 21)
Borman, G. D., Gamoran, A., & Bowdon, J. (2008). A randomized trial of teacher development in elementary science: First-year achievement effects. Journal of Research on Educational Effectiveness, 1, 237–264. doi:10.1080/19345740802328273
Rot it right: The cycling of matter and the transfer of energy. 4th Grade Science Immersion Unit. (2006, September). System Wide Change for All Learners and Educators (SCALE) Report. Madison: University of Wisconsin–Madison. Retrieved from www.scalemsp.org
This program was designed for novice teachers. Local coaches used an observation rubric to observe and critique novices’ practice, with an eye toward getting them to comply with the specific practices outlined in the observation instrument. The researchers examined both a 1-year version and a 2-year version of the program, and in each case followed teachers through a third year.
Glazerman, S., Isenberg, E., Dolfin, S., Bleeker, M., Johnson, A., Grider, M., & Jacobus, M. (2010, June). Impacts of comprehensive teacher induction: Final results from a randomized controlled study. Washington, DC: National Center for Education Evaluation. Retrieved from https://ies.ed.gov/ncee/pubs/20104027/
Providing Content Knowledge
The acronym LETRS refers to Language Essentials for Teachers of Reading and Spelling. The program is based on research findings, specifically five practices that have been found by researchers to be especially effective. The program provided five day-long seminars, or institutes, one on each topic (e.g., phonemes, phonemic awareness, etc.), and each institute was accompanied by a textbook on that topic. Seminars were interspersed throughout the school year, each covering a single topic. Notice that there is also a LETRS program in my first category, of prescribed practices. This is because the researchers tested both models, one offering only knowledge and the other offering knowledge as well as coaching.
Garet, M. S., Cronen, S., Eaton, M., Kurki, A., Ludwig, M., Jones, W., Sztejnberg, L. (2008). The impact of two professional development interventions on early reading instruction and achievement. Washington, DC: National Center for Educational Evaluation and Regional Assistance, Institute of Education Sciences. Retrieved from http://ies.ed.gov/ncee/pdf/20084030.pdf
The Rational Numbers also provided intermittent institutes in a format very similar to the LETRS program. Each institute included lectures and overheads interspersed with opportunities for teachers to solve mathematical problems, explain how they solved problems, discuss student misconceptions about these topics, and plan lessons that they would teach later on. The program also included a modest supplemental component to help teachers apply their new knowledge to their classroom instruction.
Garet, M. S., Wayne, A. J., Stancavage, F., Taylor, J., Eaton, M., Walters, K., …Doolittle, F. (2011). Middle school mathematics professional development impact study: Findings after the second year of implementation. Washington, DC: U.S. Department of Education. Retrieved from http://ies.ed.gov/pubsearch/pubsinfo.asp?pubid=NCEE20114024
Analysis of Specific Teaching Situations
Research Study Group
Study groups were formed among elementary reading teachers. Each group read articles about research findings relevant to teaching reading. In their meeting, they discussed their newest reading, then they tried to incorporate the findings into their forthcoming lesson plans. Each meeting began with a review of the results of their past efforts, then moved to a discussion of the next reading, and each closed with new lesson plans which they would try next. This program offers a unique opportunity to compare programs with similar content. The content here was very similar to that of the much-larger LETRS program which tried providing knowledge alone as well as knowledge combined with procedures.
Gersten, R., Dimino, J., Jayanthi, M., Kim, J. S., & Santoro, L. E. (2010). Teacher study group: Impact of the professional development model on reading instruction and student outcomes in first grade classrooms. American Educational Research Journal, 47, 694–739. doi:10.3102/0002831209361208
Teachers received an orientation to the CLASS observation instrument. They then videotaped their own classrooms every two weeks and sent their videos to an on-line consultant. The consultant then called them for a telephone discussion. In their conversation the consultant did not ever tell the teacher what s/he should have done, but instead asked questions about why she did something, or how did she know when to move on. These questions pushed the teacher to think more about her own strategies and her own use of evidence during teaching.
Allen, J. P., Pianta, R. C., Gregory, A., Mikami, A. Y., & Lun, J. (2011), An interaction-based approach to enhancing secondary school instruction and student achievement. Science, 333, 1034–1037. doi:10.1126/science.1207998
School Math Coaches
Math coaches were assigned to individual schools to help teachers improve their lessons. The coaches did not prescribe any particular practices or directly teach any mathematical content. Instead, they worked beside teachers, joining existing grade-level team meetings, co-planning lessons with teachers, and even co-teaching lessons with teachers. These joint activities enabled coaches to introduce new insights about student learning and also about the content itself, always within the context of specific curriculum units and specific students.
Campbell, P. F., & Malkus, N. N. (2011). The impact of elementary mathematics coaches on student achievement. Elementary School Journal, 111, 430–454. doi:10.1086/657654
Mary M. Kennedy is a professor emeritus of education at Michigan State University. Her scholarship focused on defining teacher quality and identifying the factors that most influence teacher quality. She examined the influences of teacher education, research knowledge, attitudes and beliefs, credentials, and school context. From 1986 to 1994, she directed the National Center for Research on Teacher Learning. Kennedy consulted with four ministries of education, the World Bank, and a host of national organizations. She has also published four books on these issues and has won five awards for her work, most recently the prestigious Margaret B. Lindsey Award for Distinguished Research in Teacher Education.