Now in beta — Join early access

AI Grading that cites its work.

AutoGrader combines large language models with human oversight to deliver transparent, evidence-anchored assessment at scale — every score backed by a quote from the student's own submission.

app.autograder.ai / assignments / edl-hw2
AutoGrader human-in-the-loop review interface showing student submission, score breakdown and evidence
Total Score
49 / 50
↑ Evidence cited for every point
New policy rule added
IF partial answer → deduct 0.5pts
Confidence: 0.92 · 14 instances
Used in active courses at Carnegie Mellon University
87%
Reduction in time spent grading per assignment
94%
Agreement rate with expert human graders
100%
Of scores backed by cited evidence from submission
6+
Submission formats supported: PDF, Jupyter, code & more

Built for the way real courses work

AutoGrader handles the full complexity of higher-education assessment — from OCR to policy learning — so instructors can focus on teaching.

🔍

Evidence-Anchored Scoring

Every point awarded or deducted is backed by an exact quoted passage from the student's submission — no black-box decisions, full auditability.

🧠

Policy Learning

When TAs override AI scores, the system learns generalizable IF/THEN grading rules that automatically apply to all future submissions in that course.

👁️

Human-in-the-Loop Review

A three-panel workstation gives TAs simultaneous access to the reference solution, student submission, and an interactive AI grading console.

📄

Multi-Format Ingestion

Accepts PDFs, Python files, Jupyter notebooks, Canvas quiz exports, HTML, and Markdown — reflecting the reality of today's mixed-format coursework.

AI Rubric Builder

Upload a reference solution and answer a few questions — AutoGrader generates a complete, calibrated rubric through a clarifying dialogue workflow.

📊

Bulk Processing & Analytics

Process entire class sections with configurable concurrency. Export grade reports as Excel, CSV, PDF with histograms, or JSON for your LMS.

Three-stage pipeline, zero opacity

A rigorous multi-stage architecture ensures that every decision is traceable, every score is defensible, and every override makes the system smarter.

1

Extraction & Parsing

Dual OCR (Mistral AI + PyMuPDF) produces rich markdown with block-level source maps — page, block, and line references for every passage.

2

LLM Judge Scoring

An LLM grades each rubric category with cited evidence (max 250 chars per quote), explicit reasoning, and calibrated confidence scores.

3

Policy Calibration

Stored grading rules from prior TA overrides are retrieved and applied — only above the confidence threshold — to align output with instructor expectations.

4

Human Review & Memory Update

TAs review, edit, and approve. Any override triggers the Memory Updater, which extracts a generalizable rule for future use.

Live Pipeline
📥
Extractor
Dual OCR · block-level source maps
Done
⚖️
Judge
Rubric adjudication · evidence citation
Done
🎯
Calibrator
Applying 3 policy rules…
Running
👤
TA Review
Human-in-the-loop workstation
Waiting
PDF Jupyter Python Canvas Quiz HTML Markdown

Your grading style,
codified automatically

Every TA override becomes a reusable rule. The system accumulates institutional knowledge, progressively aligning with your pedagogical intent — no extra work required.

Override rate over time
↓ 68% fewer overrides
after 3 grading cycles with policy learning enabled
TA overrides per 100 submissions by cycle →
Cycle 12345Cycle 6
Policy Memory — EDL 201
12
Active grading rules
accumulated this semester
61
Total rule applications
across all assignments
0.91
Average rule confidence
score across active rules
↓68%
Fewer TA overrides after
3 grading cycles
Rule caps: 50 per assignment · 200 per course Auto-enforced

Everything a TA needs, in one view

The grading workstation puts reference solutions, student work, and AI analysis side-by-side — so reviewers spend time judging, not searching.

📋

Three-Panel Layout

Reference solution, student submission, and grading console shown simultaneously with lazy page rendering for performance.

💬

Conversational AI Console

Ask the grader follow-up questions, request score adjustments in natural language, and explore evidence trails interactively.

🔴

Real-Time Progress via WebSocket

Live pipeline status updates keep TAs informed as each stage completes, with adjacent submission prefetching for zero-lag navigation.

🔍

Plagiarism & Similarity Analysis

Integrated analysis surfaces potential similarity flags alongside grade breakdowns, keeping the review holistic.

AutoGrader — Chat with Grader
AutoGrader conversational AI grading interface
Total Score: 50/50 — AI grader responds with cited reasoning

Grading has never felt this transparent

Hear from instructors and teaching assistants who've used AutoGrader in real courses.

★★★★★

"AutoGrader has completely changed how I think about grading at scale. The evidence-anchored scores mean students never question why they lost a point — it's right there in their own words. We went from three days of TA grading to a single afternoon review session."

SB
Saksham Bhutani
Teaching Assistant · ECE Department · CMU
★★★★★

"I was skeptical that an AI could capture the nuance our rubrics require, but after the first homework the policy learning had already picked up on our notation conventions. By week four I was spending more time on teaching than grading — which is exactly how it should be."

KR
Kiruthika Raja
Teaching Assistant · ECE Department · CMU
🎓  Free pilot for courses with 50+ students

Ready to reclaim
your grading time?

See AutoGrader in action with your own rubric and sample submissions. Our team will walk you through a live demo tailored to your course format.