The Definition Gap

Why Schools Keep Redesigning Evaluation Systems Every Few Years

Feb 22, 2026

tl;dr: Schools keep designing evaluation instruments to solve what is a shared language problem. Faculty want feedback and growth. Leaders want accountability and improvement. But no one has defined what those words mean at this school. The system becomes the lightning rod for anxiety that was always about ambiguity. Fix the definitions first — but not in isolation. Define enough to build, build enough to test, test enough to refine. The work is iterative, not sequential.

A school I work with is launching another attempt at evaluation reform. They’ve been here before — multiple times — with different instruments, different leaders, and the same result.

The previous attempts weren’t poorly designed. Leadership wasn’t lacking commitment (at least initially). And faculty didn’t oppose the idea of feedback — survey data shows they actively want it.

Every attempt failed because the school kept building systems on top of words no one had agreed on.

What does “feedback” mean here? Is it a classroom observation followed by a conversation? A written summary? A number on a rubric? Something formative or something that goes in a file? When faculty say they want “more feedback,” do they mean more data about their teaching or more affirmation that they’re doing well? (Often, it’s both. And those require very different systems.)

What does “growth” mean here? Growth toward what? Mastery of new pedagogical techniques? Deeper expertise in a content area? Leadership development? The word carries different weight for a second-year teacher than for a twenty-year veteran, and most evaluation systems treat them identically.

These aren’t semantic quibbles. They’re structural failures. When you build a feedback system on undefined terms, every stakeholder projects their own meaning onto the process — and then feels betrayed when the system delivers something different.

The Pattern Across Schools

I’ve seen this sequence across enough schools to call it a pattern.

A school decides it needs better evaluation and feedback practices. A task force convenes. They research models — Danielson, Marshall, peer observation protocols, growth-based systems. They design an instrument. They pilot it. Faculty comply but don’t engage. Administrators complete the forms but don’t change their behavior. Within two years, the initiative loses momentum. Leadership turns over. A new head or administrator arrives and says, “We need a better evaluation system.” The cycle restarts.

The instruments aren’t the problem. The problem is what’s underneath them — or rather, what isn’t.

Every one of those schools skipped the same step: defining, in their own specific context, what the foundational terms mean. Not adopting someone else’s definitions, and also not importing a framework wholesale. Sitting down and agreeing on what feedback, growth, excellence, and accountability look like in their hallways, with their faculty, given their mission.

Everyone Wants Feedback Until It Arrives

There’s a line I keep returning to in this work: everyone wants change except the form in which it arrives.

Faculty survey data consistently tells the same story. Teachers want more frequent classroom visits. They want honest, specific feedback about their practice. They want opportunities to grow. They want to be seen.

Then a formal evaluation system launches, and anxiety spikes. Will this be used against me? Is this really about improvement, or is it about documentation for dismissal? Who’s evaluating the evaluators? Will this feel like surveillance?

The gap between wanting feedback in principle and receiving it in practice is where most evaluation systems go to die. And the gap exists because the terms were never defined. When “feedback” could mean anything from a supportive coaching conversation to a written reprimand, of course people get nervous. The ambiguity is the threat, not the system itself.

The Instinct to Skip

So, why do schools keep jumping straight to instrument design?

The definitional work is uncomfortable. Defining what “excellent teaching” means at your school forces choices. It means saying that some practices matter more than others. It means leaders have to articulate expectations they’ve been keeping implicit — which is easier for everyone until it isn’t. Designing a rubric, by contrast, feels productive. You can see it, share it, iterate on it. It’s tangible in a way that philosophical groundwork isn’t.

There’s also a capacity problem. The people leading evaluation reform are often administrators who were promoted for their teaching, not for their ability to facilitate hard conversations about professional identity. The skills required to build an instrument are different from the skills required to lead a faculty through the kind of foundational work that makes the instrument meaningful.

This is worth sitting with. Administrator capacity isn’t just about time management or scheduling classroom visits. It’s about the ability to hold a conversation that says, with care and clarity: Here’s what we believe about teaching at this school. Here’s what the floor looks like. Here’s what the ceiling looks like. And here’s how we’ll support you in closing the gap between where you are and where you could be.

That conversation requires definitions. Without them, administrators default to what’s comfortable — affirming strengths, avoiding tension, completing forms.

Define Enough to Build. Build Enough to Test.

Here’s where I want to push back on the simple version of this argument, including my own instinct to package it neatly for you.

“Define before you design” sounds right, and it’s directionally correct. But taken as a strict sequence, it becomes its own trap. I’ve watched schools spend eighteen months defining terms so precisely that they never build anything. The definitions become an academic exercise — endlessly refined, never tested against practice. That’s analysis paralysis wearing a thoughtful disguise.

The better frame is iterative. Define enough to build a first version. Build something small. Test it. Let the testing sharpen the definitions. Then rebuild.

At one school, we’re working with a multi-year arc. The first phase isn’t “get the definitions perfect.” It’s “establish foundational truisms about teaching here and build administrator capacity at the same time.” Define what feedback means at this school — not exhaustively, but well enough that everyone’s working from the same foundation. Then run a pilot. See where the definitions hold and where they break. Refine.

Some of this work is sequential. You can’t pilot an observation protocol until you’ve agreed on what you’re looking for. You can’t give meaningful feedback until you’ve defined what growth looks like in context.

But some of it runs in parallel. While a task force is doing definitional work, administrators can be building their own capacity for coaching conversations. Faculty can be studying feedback practices from outside education — medicine, aviation, performing arts — to expand their frame of reference. The institution can be collecting data on current practices to establish a baseline.

The sequential and parallel tracks inform each other. That’s the design. Rigid linearity kills momentum just as surely as skipping definitions entirely.

The Truisms Exercise

One approach I’ve found effective: before touching any instrument or process, ask the faculty and leadership to co-create a set of “truisms” about teaching at their school. Truisms are statements so foundational that disagreement with them would signal a misfit.

Something like: At this school, we believe every teacher deserves specific, actionable feedback about their classroom practice at least twice a year. Or: At this school, we believe that professional growth is a shared responsibility between the teacher and the administration, not a compliance exercise.

These aren’t revolutionary statements, and they’re not meant to be. They’re meant to be true — uncontroversially, foundationally true for this particular community. The act of articulating them does something that importing someone else’s framework never can: it builds shared ownership before the system exists.

Once you have truisms, the system design becomes an expression of beliefs rather than an imposition of structure. That lands differently with faculty. “We’re building a system that reflects what we already said we believe” creates different energy than “Here’s a new evaluation system.”

Design for Your Best

One more piece of this: most evaluation systems are designed with the struggling teacher in mind. The rubric that catches underperformance. The documentation trail that supports a difficult conversation. The progressive steps that lead to a performance improvement plan.

This is backwards. Design for your best teachers — the ones who are already good and want to get better. Build the system around the question: How do we make our strong teachers excellent? If you get that right, you’ll have a system that also identifies teachers who can’t or won’t engage in growth.

But the energy, the culture, the framing — it should be aspirational and achievable. A meaningful stretch. When the system is built around catching underperformance, strong teachers disengage because it feels beneath them. When it’s built around making good teachers better, the struggling ones get lifted too. A culture that normalizes growth as a professional expectation changes the meaning of the whole exercise.

The Real Infrastructure

Evaluation reform is a courage problem dressed up as a systems problem. The courage required isn’t in designing the instrument. It’s in doing the harder, slower, less visible work underneath: defining terms, building administrator capacity, and committing to a timeline that lets the work take root.

Schools that skip the definitional work and jump to system design will keep cycling through evaluation reforms every few years — each one generating initial enthusiasm, followed by compliance without engagement, followed by quiet abandonment.

The ones that invest in the foundation first — imperfectly, iteratively, but genuinely — will build something that lasts. Not because the instrument is better, but because the shared understanding underneath it is real.

Three questions worth sitting with:

If you asked five of your faculty what “feedback” means at your school, would you get five versions of the same answer or five different ones?
Where in your institution are you building processes on top of undefined terms — and calling the resulting resistance a people problem instead of a clarity problem?
What truisms about teaching at your school are so foundational that articulating them feels almost unnecessary — and could that false sense of obviousness be exactly why they’ve never been written down?

Still Day One

Discussion about this post

Ready for more?