Understand where learning breaks. With evidence, not guesses.

PROJECT: CLARION
YEAR: 2026
TYPE: AI-ASSISTED DIAGNOSIC WORKSPACE · EDTECH
00.0 CONTENTS
01 CONTEXT
00.1 / Opening and TLDR
00
01.1 / The real problem
01
02.1 / Research findings
03
02 PROCESS
03.1 / What I got wrong first
04
04.1 / Product vision
05
05.1 / Authority governance
06
03 RESULT
06.1 / Product walkthrough
08
07.1 / Permanent exclusions
09
08.1 / Outcomes and open questions
10

00.0 OPENING STATEMENT
Most AI tools fail teachers not by being wrong, but by being too loud.
I designed Clarion around a different principle: an AI earns trust by knowing its limits.
This is the full account of how I built that system, every research finding, every structural decision, and every feature I deliberately left out.
TL;DR
Teachers spend 3–6 hours weekly diagnosing student learning manually.
I designed an AI workspace that earns trust by surfacing only what it can prove, and staying silent when it can't.
Result:
-Weekly diagnostic orientation under 2 minutes.
-Zero auto-approved insights.
-Every decision owned by the teacher, enforced structurally.

CHAPTER 01 · THE REAL PROBLEM
A Sunday Afternoon in Room 7B
It is 3:17 PM on a Sunday. Priya Sharma, a Grade 7 Mathematics teacher, sits at her kitchen table with 43 student notebooks. She has been here since noon. She is not grading , she finished that on Friday. What she is doing now is harder, slower, and lonelier: she is trying to understand.

She notices a pattern. Multiple students converting fractions by finding common denominators instead of dividing numerators by denominators. A logical mistake, rooted in something she taught three weeks ago. Did her worksheet create the wrong scaffold?

Priya does not need another grading tool. She does not need a dashboard with bar charts.

She needs someone to sit beside her and say: "Here. This pattern is real. I've seen it across nineteen responses. Here is the evidence. Now, what do you want to do about it?"

That is what I built Clarion to do.
CHAPTER 02 · RESEARCH
I wanted to understand what would make this product genuinely useful in a real classroom, not just theoretically valuable.
Since direct access to teachers was limited, I conducted AI-assisted desk research using Perplexity Deep Research, synthesizing insights from educator interviews, teaching forums, academic studies, and classroom management resources.
The research focused on Grade 6–8 teachers across Math, Science, and Language Arts, uncovering recurring challenges around classroom attention, student engagement, workload management, and assessment efficiency.

00.0 RESEARCH FINDING TABLE
METRIC
FINDING
Weekly diagnostic time
3–6 hours, evenings and weekends
Primary tool failure
No tool detects class-level learning gaps — only submissions and grades
AI tool experience
70% tried AI tools; most stopped due to lack of visible reasoning
Evidence requirement
70% need to see the evidence before trusting any conclusion
Paper prevalence
60% primarily paper-based; full upload requirement is a dealbreaker
Student AI usage awareness
All aware; most uncertain how to respond diagnostically
THE 4 DIMENSIONS I IDENTIFIED
01
TIME-COLLAPSED PATTERN RECOGNITION
Entirely manual. Entirely memory-dependent. Teachers hold patterns in their heads, and the conclusions may be wrong by Monday.
02
GRADES ARE THE WRONG SIGNAL
One score. Five possible reasons behind it. A grade can't tell you which one, so teachers build the workaround manually.
03
PAPER CREATES AN INVISIBLE WALL
60% of classrooms run on paper. Any system that demands full digitization first fails before it starts.
04
AI WITHOUT EVIDENCE DESTROYS TRUST
Black box in, trust out. If teachers can't see the reasoning, they can't apply their own judgment to it.
"I don't need another grading tool. I need help understanding where my instruction failed so I can fix it."
"I won't trust a system that just tells me what's wrong without showing me why. I need to see the actual student work."
THE RESEARCH-TO-DESIGN STRATEGY MAP
RESEARCH EVIDENCE
CORE INSIGHT
DESIGN DECISION
3–6 hrs/week manual diagnosis
Diagnosis is the real bottleneck, not grading
Weekly cadence; max 5 surfaced insights
70% need visible evidence
Evidence is a precondition, not a preference
Evidence-first architecture; draft insights only
60% paper-based classrooms
Upload burden kills adoption
Sampling model; no completion pressure
All tools track logistics, not understanding
No tool addresses class-level concept gaps
Diagnostic workspace, not LMS replacement
Black-box AI distrust
Transparency is not a feature — it is trust
Approve/edit/reject; no auto-approval ever
CHAPTER 03 · WHAT I GOT WRONG FIRST
What didn't work
My first design direction included a real-time alert system. When Clarion detected a pattern with high confidence, it would notify the teacher immediately push notification, in-app badge, the works.
Research killed it.
My first direction included real-time alerts. Research killed it.
Teachers called it anxiety-inducing. Impossible to act on mid-lesson.
One teacher put it plainly: "I don't need my phone telling me something is wrong while I'm standing in front of 35 students."
I removed the entire real-time layer. Not deprioritised. Removed.
CHAPTER 04 · PRODUCT VISION
What Clarion Is and What It Is Not
Before I drew a single wireframe, I spent time writing down what Clarion was not. In educational AI, wrong product positioning is not just a marketing error, it is an ethical failure.

IS / IS NOT table
CLARION IS
CLARION IS NOT
A weekly diagnostic workspace
A real-time monitoring system
A class-level pattern detector
A student evaluation tool
An evidence surfacing system
A grading or scoring tool
A human-in-the-loop aid
An autonomous decision-maker
A sampling-based signal system
A comprehensive data platform
A teacher authority preserver
A prescription engine

5 NON-NEGOTIABLE PRINCIPLES
01
EVIDENCE BEFORE INSIGHT
Raw signals always appear before AI interpretation. I designed the IA so this order is structurally enforced, not just encouraged by copy.
03
WEEKLY RHYTHM OVER REAL-TIME NOISE
No push notifications. No urgency signals. No badges. The system matches how teachers already plan, weekly and not how software companies want engagement.
05
SILENCE IS A FEATURE
When the system doesn't have sufficient evidence, it says nothing. I had to defend this decision repeatedly, emptiness feels like a bug to stakeholders. It isn't. A weak insight is worse than no insight at all.
02
CLASS-LEVEL PATTERNS BEFORE INDIVIDUALS
Clarion diagnoses learning, not learners. I made this a structural rule, not a preference. Individual student flags were removed from every surface.
04
HUMAN-IN-THE-LOOP IS MANDATORY
No insight can affect any downstream surface without explicit teacher approval. I built this as a system law, not a UX pattern. The AI drafts. The teacher decides.
CHAPTER 05 · THE AUTHORITY GOVERNANCE MODEL
Where AI can appear, and where it is forbidden
AI systems fail in education not because of poor models. They fail because of authority creep, the gradual drift from assistant to advisor to authority. I designed explicitly against this failure mode.
I defined every surface where AI is permitted. I locked every surface where it is forbidden. This is system law, not UX guidance.
AI PERMITTED IN
AI FORBIDDEN FROM
Draft insight text blocks — conditional language only
Headings or section titles
Evidence annotations — descriptive only, no interpretation
Summaries or conclusions
Withdrawal messages — insufficient evidence trigger
Navigation labels
Silence states — absence of output with neutral reason
Calls to action
Boundary disclosures — what the system cannot do
Empty states
Error states — technical limitation descriptions only
Comparative or temporal frames
Teacher notes or reflections
THE FOUR SYSTEMS LAWS I WROTE BEFORE BUILDING ANY SCREEN:
Upload ≠ Analysis
The system doesn't start watching you the moment you give it data
Insight ≠ Action
Seeing something doesn't mean the system acts on it
Analysis ≠ Insight
Not everything the system detects gets shown to you
Teacher decides every transition
Every step requires a conscious human choiceystem doesn't start watching you the moment you give it data
Any appearance of AI output outside the permitted surfaces is an authority breach, regardless of tone, usefulness, or intent.
CHAPTER 06 · THE FOUR-LAYER IA
Structure as authority
Most product teams move from research to wireframes. I introduced an intermediate layer: a governed information architecture that encoded authority restraint structurally, not visually.
The reasoning: if the hierarchy is wrong, language restraint will fail. Copy can be edited. Structure is harder to break.
Every screen in Clarion follows a mandatory four-layer structure. The order is non-negotiable. AI is placed strictly downstream.
LAYER
CONTENT
RULE
1. Context
Class, subject, week, submission coverage
No AI content allowed. Must function independently.
2. Evidence
Raw signals, missing signals, conflicting signals
No summaries, no evaluative language. Evidence precedes interpretation.
3. Teacher Space
Teacher notes, tags, decisions
AI must withdraw if teacher content is present. Teacher never responds to AI.
4. AI Annotation
System annotations, insight block, silence states
Never appears first. Never concludes. Removable without breaking the system.

This is the full product structure I mapped before drawing a single screen. Four navigation destinations. Every child node governed by the four-layer rule. AI appears nowhere in the top-level structure, it is embedded, secondary, and conditional throughout.
CHAPTER 07 · KEY DESIGN DECISIONS
Decision 01
Weekly Cadence, Not Real-Time
WHAT I DECIDED
All diagnostics operate on a weekly cycle. No real-time alerts. No push notifications.
The default direction: Real-time detection. Push the moment confidence is high.
Why I rejected it: Real-time spikes anxiety, not diagnostic quality. Teachers plan weekly, the system should too.
Alternatives I considered: Real-time alerts, daily digests, event triggers. All rejected.
Tradeoff I accepted: No instant feedback.
What I gained: Lower cognitive load, calmer UX, higher signal quality, appropriate pacing.
Decision 02
Sampling Model, Not Completeness
WHAT I DECIDED
Teachers upload 5–10 representative examples. No submission counters. No completion pressure. Confidence adjusts based on sample size.
The default direction: Require full-class uploads. Push toward 45/45 completion.
Why I rejected it: Paper classrooms can't digitize everything. 5–10 strong examples beat 45 shallow ones.
Alternatives I considered: Full-class upload, auto-digitization, student-submitted portal.
Tradeoff I accepted: Reduced coverage certainty.
What I gained: Real-world viability, adoption in paper classrooms, ethical defensibility.
Decision 03
Structural IA, Not Copy-Level Restraint
WHAT I DECIDED
Authority restraint encoded in information architecture, order, placement, collapse rules, not in cautious language or disclaimers.
The default direction: Write careful copy. Add disclaimers. Trust the words.
Why I rejected it: Copy gets edited. Structure doesn't. Restraint needs to survive a refactor.
Tradeoff I accepted: Harder to build, less immediately impressive in a demo.
What I gained: Governance that survives UI changes.
Decision 04
Draft-Only Insights, No Auto-Approval
WHAT I DECIDED
All insights surface as drafts. Approval requires explicit teacher action. Auto-approval was never implemented.
The default direction: Auto-approve above a confidence threshold. Let teachers opt out.
Why I rejected it: Silence isn't consent. Approval is an authority transfer, it has to be active.
Alternatives I considered: Confidence-based auto-unlock, approval by inaction (timeout), auto-approve with opt-out.
Tradeoff I accepted: Slower time-to-value.
What I gained: Teacher authority preserved, automation bias prevented, ethical defensibility.
Decision 05
Silence as Success
The Call Only I Could Make
WHAT I DECIDED
When confidence thresholds are not met, the system produces no output. Silence is explicitly named in the Action Readiness panel — not hidden, not treated as an error. It is a first-class product outcome.
The default direction: Surface more. Fill the screen. Silence reads as a bug to most teams.
Why this is the call AI couldn't make: No prompt produces this. It requires trusting restraint over noise, before any tool is open.
Alternatives I considered: Showing low-confidence insights with heavy disclaimers, showing partial insights to fill the screen, "check back later" prompts.
Tradeoff I accepted: Product feels quiet to the unfamiliar eye.
What I gained: Epistemic integrity, long-term trust, prevention of automation bias.

CHAPTER 08 · SCREEN-BY-SCREEN DECISIONS
Hypotheses based on this week's opted-in work. You decide what gets approved.
WEEKLY SUMMARY

Diagnostic Snapshot
Confidence level. Draft badge. Observation-based heading. "Written explanations show range of detail", not "Students are struggling."
Reviewed This Week
Approved and rejected insights, both visible. The system's track record, not just its output.


Action Readiness Panel
What the system chose not to show and why. This is the most trust-building element on the screen.

INSIGHT DETAIL
Insight Detail View
Nothing downstream happens without passing through here.
Confidence before Evidence seen first, on purpose. Prevents anchoring on one vivid example.


Evidence Snippets
Source context. Excerpt. Highlighted pattern. Aggregation count. No grades. No corrective language. "This is what the system noticed", not "this is wrong."
Three controls only Approve. Edit. Reject. No passive dismissal, one action is required.

ASSIGNMENTS
Assignments Screen
Inclusion requires an explicit toggle, never assumed. "Inclusion is off by default. You choose what to include. Uploading an assignment does not trigger any analysis."
Upload ≠ Analysis, stated, not implied.


STUDENTS VIEW

Context, Not Assessment
Empty by default. Activates only after one insight is approved. Three columns. Nothing else.
Name · Context · Submission count.
No grades. No rankings. No color coding. No flags. "Appears in 2 approved insights", factual, backward-looking, not a judgment.

PLANNING SCREEN
Teacher-Authored Thinking Space
Activates only after approval. Notes and actions — teacher-authored, every word.
No auto-generated lesson plans. No suggested activities. No AI-drafted content. You did the diagnostic work. This part is yours.

Chapter 09 · What I Never Built
These are permanent exclusions. Not “not in MVP.” Not “maybe later.” Never. I documented this list before shipping.
Feature
Why It Will Never Exist
AI detection / plagiarism flags
Accusatory, unverifiable, damages teacher-student trust
Student risk scores
Labels learners, creates stigma, surveillance dynamic
Trend graphs and trajectories
Implies evaluation, creates false momentum narrative
Auto-generated lesson plans
Reduces teacher professional identity to execution
Real-time alerts
Anxiety-inducing, breaks weekly diagnostic rhythm
Parent dashboards
Extends surveillance beyond classroom consent boundary
Predictive performance scoring
False certainty, ethical risk, no defensible evidence base
Chapter 10 · Outcomes
Weekly diagnostic orientation: under 2 minutes.
Manual diagnosis time reclaimed: 3–6 hours per teacher per week.
Auto-approved insights: 0.
Every decision owned by the teacher, enforced structurally.
“Clarion becomes more valuable as it becomes less assertive.”
The best outcome is a teacher who closes Clarion on Monday morning, walks into class with a clear hypothesis about where learning broke, and knows exactly what evidence supports it. That is what I designed for.
Chapter 11 · What I’d Do Differently — and What’s Still Open
The confidence scoring system needs a visible layer. Not just “Clarion suggests this” but “Clarion is 87% confident based on 14 data points.” Trust is built in the details nobody thinks to design.
The longitudinal class intelligence direction is the highest-value next step — weekly insights viewed across multiple weeks. With hard constraints: no student-level tracking, no predictions, no trends. Occurrence only. Never direction.
The most dangerous moment for a product with principled restraint is the second roadmap cycle — when stakeholder pressure pulls toward automation and prescription. I documented the post-MVP evolution strategy before shipping exactly because of this. Every future capability has a written constraint about what it must never do.
The hardest question I haven’t designed for yet: What happens when a school administrator wants to see Clarion’s data? The product says no — it was built for the teacher, not the institution. But I haven’t designed for that conversation. That’s the tension I’d explore next: what does ethical restraint look like when the pressure doesn’t come from a product decision — it comes from a contract?
Closing
← Back to all work
Next case study:
Users don’t search for items. They search for intentions. → ZEPTO
Liked what you read?

