ExplainGrade — Explainable Short Answer Grading System

Architecture

Two-Stage Grading Pipeline

Transparent by design — every point is traceable to a specific text comparison.

⚖️1

Stage 1: Rule-Based Floor

Calculates a guaranteed minimum score based on literal technical term coverage. Students cannot score below what they explicitly mentioned.

Technical Term Match (Floor)

→

📐2

Stage 2: Semantic Grading

Applies the PMC12171532 equations using NLP metrics and Semantic Similarity to award points for meaning and conceptual understanding.

NLP Metrics + Semantic Match

→

🏆

Final Score

The system combines the floor and semantic scores, capping the result at the maximum marks while generating explainable feedback.

Final Capped Result

Metrics

NLP Grading Metrics

Exact weights from PMC12171532 — every number is from the published paper.

S_tf

50%

Semantic Similarity

TF cosine similarity of full answer vectors — proxy for Universal Sentence Encoder (USE) from the paper

S_j

15%

Jaccard Similarity

Token-set intersection over union — rewards correct vocabulary and key terminology

S_c

15%

Cosine Similarity

Word-frequency vector cosine angle — captures phrasing and synonym usage

S_w

15%

Normalized Word Count

ref keywords ÷ student keywords — penalises very short or off-topic answers

S_e

5%

Edit Similarity

Normalized Levenshtein — catches typos and phrasing variations at character level

📋 Exact Scoring Equations (PMC12171532)

(a) Base NLP Score

C_nlp = min(max(0, 0.15·S_j + 0.05·S_e
                + 0.15·S_c + 0.15·S_w), 1)

(b) Confidence Score

C = min(max(0, 0.5·S_tf + 0.5·C_nlp),
                1)

(c) Final Score Rule

F = 0 if S_tf<0.2  |  1 if
                S_tf≥0.9 & S_w≥0.85  |  C
                otherwise

(d) Stage 2 Score Stage2 = F × MaxMarks Final = min(MaxMarks, Stage1 + Stage2)

Live Demo

Grade Any Answer — Instantly

All computation runs in your browser. Zero data is sent to any server.

✍️ Enter Answers

— / 5

Predicted Score

🗺️

Concept Coverage Map

Visual · Key Concepts

Each bubble = one key concept extracted from the reference answer. Green = covered in your answer, Red = missed. Size = concept importance (TF score).

Inspired by ExASAG (Filighera et al., BEA 2023) — each sentence in your answer is scored independently. ■ Strong sentences pushed your score up. ■ Weak sentences pulled it down.

▲ Green bars = metric boosted your score above average · ▼ Red bars = metric pulled score below average. Hover for details.

Batch Mode

Grade an Entire Class at Once

Upload a CSV — get scores for every row in seconds, entirely in-browser.

Required CSV columns (flexible name matching): question desired_answer (or reference_answer / ideal_answer) student_answer (or response) — matches the Mohler dataset format directly.

📂

Drop your CSV here, or click to browse

Supports the Mohler dataset format out-of-the-box.

Class Evaluation

Evaluate Student Summaries vs Meet Transcript

Upload your Google Meet Transcript (.docx) and Student Summaries (.xlsx). The system will automatically refine the transcript into a reference answer and grade each student.

📄

1. Upload Teacher Script

Provide the Google Meet transcript (.docx)

📊

2. Upload Student Summaries

Must contain columns: emailAddress, name, summary (.xlsx)

Usage Guide

Two Ways to Use ExplainGrade

Use the browser instantly, or run locally for heavier workloads and better semantic accuracy.

🌐

Option 1 — Use Directly in Browser

No install required. All computation runs in your browser.

1

Single Answer Grading
Go to Live Demo → paste a reference answer and student answer → click Compute Score.

2

Batch CSV Grading
Go to Batch Grade → upload a .csv with columns question, desired_answer, student_answer → results appear instantly.

3

Class Evaluation (Script + Summaries)
Go to Script Eval → upload a Google Meet transcript .docx and student summaries .xlsx → click Run Evaluation.

✅ Pros: Zero setup, works on any device with a browser, private (no data sent to server).
⚠ Limitation: Uses TF-cosine similarity (in-browser approximation). For best accuracy, use the local runner.

🖥️

Option 2 — Run Locally (Better Accuracy)

Uses real sentence-transformers (all-MiniLM-L6-v2) for semantic similarity.

1

Prerequisites
Python 3.9+ installed. Download Python →

2

Clone the repository

git clone https://github.com/ManikeshK1/Explainable_Summary_Score.git
cd Explainable_Summary_Score

3

Create a virtual environment & install dependencies

# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# Install all dependencies
pip install -r requirements.txt

4

Run the local grader

python local_grader.py \
  --docx "path/to/transcript.docx" \
  --xlsx "path/to/summaries.xlsx" \
  --max-score 5

Results are saved to grading_results.csv in the same folder.

5

XLSX format expected
Your summaries spreadsheet must have these columns (names are flexible):
emailAddress · name · summary

✅ Pros: Real sentence-transformers semantic model, processes hundreds of students fast, exports CSV automatically.
⚠ Note: Requires internet on first run to download the ~90 MB model (cached after that).

Research

The Research Behind ExplainGrade

🎓 Project Overview

ExplainGrade is an automated short-answer grading (ASAG) system built on two published research works and the Mohler Short Answer Grading Dataset — a benchmark of 2,273 student responses to Computer Science questions, graded 0–5 by human annotators.

The system addresses the twin problems present in ASAG: length-bias noise (longer answers get artificially inflated scores) and black-box scoring (students receive no actionable feedback). By anchoring every grade to specific, measurable NLP comparisons, every point is traceable.

The explanation layer is directly inspired by ExASAG (Filighera et al., BEA 2023), which introduces sentence-level attributions (SLA) — rating individual sentences in the student's response for their contribution to the final grade, giving students precise, actionable feedback.

📄 Key References

1

Ahmad Ayaan & Kok-Why Ng (2024)
Automated grading using natural language processing and semantic analysis. PMC12171532. — Source of the NLP metric weights and scoring equations used in Stage 2.

2

Filighera et al. (2023)
Our System for Short Answer Grading using Generative Models. BEA Workshop, ACL 2023. — Source of the sentence-level attribution (SLA) explanation framework.

3

Mohler et al. (2011)
Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. ACL. — Dataset used for training and evaluation.

🛠 Tech Stack

Python 3.12 scikit-learn RF KeyBERT sentence-transformers all-MiniLM-L6-v2 Levenshtein SHAP pandas / numpy TF Cosine Sim Jaccard Similarity HTML5 / CSS3 Vanilla JS PapaParse Canvas API GitHub Pages

📊 NLP Metric Reference

Symbol	Metric	Weight	What It Measures
`S_tf`	Semantic Similarity	50%	TF cosine of full answer vectors — proxy for Universal Sentence Encoder contextual meaning
`S_j`	Jaccard Similarity	15%	Token-set intersection/union — rewards correct domain vocabulary and key terms
`S_c`	Cosine Similarity	15%	Word-frequency vector angle — captures synonym use and varied phrasing
`S_w`	Norm. Word Count	15%	ref keywords ÷ student keywords — prevents length inflation bias
`S_e`	Edit Similarity	5%	Normalised Levenshtein — character-level phrasing similarity for typo tolerance

Chat Assistant

Ask Questions About Student Scores

Get instant answers about student performance, statistics, and system explanations using natural language.

💡 Try asking:

Who has the highest score?

How many students scored below 0.3?

What's the average score?

Score for John Doe?

How does the system work?

Show all students above 0.4

Short Answer Grading
That Shows Its Work

Two-Stage Grading Pipeline

Stage 1: Rule-Based Floor

Stage 2: Semantic Grading

Final Score

NLP Grading Metrics

Grade Any Answer — Instantly

✍️ Enter Answers

Concept Coverage Map

Sentence-Level Attribution

What This Score Means

Metric Attributions

Grade an Entire Class at Once

Drop your CSV here, or click to browse

Evaluate Student Summaries vs Meet Transcript

1. Upload Teacher Script

2. Upload Student Summaries

🧠 Refined Reference Answer (Teacher Script Summary)

Two Ways to Use ExplainGrade

Option 1 — Use Directly in Browser

Option 2 — Run Locally (Better Accuracy)

The Research Behind ExplainGrade

🎓 Project Overview

📄 Key References

🛠 Tech Stack

📊 NLP Metric Reference

Ask Questions About Student Scores

💡 Try asking:

Short Answer Grading That Shows Its Work

Two-Stage Grading Pipeline

Stage 1: Rule-Based Floor

Stage 2: Semantic Grading

Final Score

NLP Grading Metrics

Grade Any Answer — Instantly

✍️ Enter Answers Try Sample

Concept Coverage Map

Sentence-Level Attribution

What This Score Means

Metric Attributions

Grade an Entire Class at Once

Drop your CSV here, or click to browse

Evaluate Student Summaries vs Meet Transcript

1. Upload Teacher Script

2. Upload Student Summaries

🧠 Refined Reference Answer (Teacher Script Summary) Show/Hide

Two Ways to Use ExplainGrade

Option 1 — Use Directly in Browser

Option 2 — Run Locally (Better Accuracy)

The Research Behind ExplainGrade

🎓 Project Overview

📄 Key References

🛠 Tech Stack

📊 NLP Metric Reference

Ask Questions About Student Scores

💡 Try asking:

Short Answer Grading
That Shows Its Work

✍️ Enter Answers

🧠 Refined Reference Answer (Teacher Script Summary)