πŸŽ“ NLP Research Β· Explainable AI Β· Open Source

Short Answer Grading
That Shows Its Work

An automated grading system inspired by ExASAG (BEA 2023) and PMC12171532. Grades student answers using a two-stage pipeline β€” rule-based floor + NLP similarity scoring β€” then explains exactly which sentences and concepts drove the grade.

2
Scoring Stages
5
NLP Metrics
2,200+
Mohler Dataset Rows
100%
Runs In Browser
Architecture

Two-Stage Grading Pipeline

Transparent by design β€” every point is traceable to a specific text comparison.

βš–οΈ1

Stage 1: Rule-Based Floor

Calculates a guaranteed minimum score based on literal technical term coverage. Students cannot score below what they explicitly mentioned.

Technical Term Match (Floor)
β†’
πŸ“2

Stage 2: Semantic Grading

Applies the PMC12171532 equations using NLP metrics and Semantic Similarity to award points for meaning and conceptual understanding.

NLP Metrics + Semantic Match
β†’
πŸ†

Final Score

The system combines the floor and semantic scores, capping the result at the maximum marks while generating explainable feedback.

Final Capped Result
Metrics

NLP Grading Metrics

Exact weights from PMC12171532 β€” every number is from the published paper.

Stf
50%
Semantic Similarity
TF cosine similarity of full answer vectors β€” proxy for Universal Sentence Encoder (USE) from the paper
Sj
15%
Jaccard Similarity
Token-set intersection over union β€” rewards correct vocabulary and key terminology
Sc
15%
Cosine Similarity
Word-frequency vector cosine angle β€” captures phrasing and synonym usage
Sw
15%
Normalized Word Count
ref keywords Γ· student keywords β€” penalises very short or off-topic answers
Se
5%
Edit Similarity
Normalized Levenshtein β€” catches typos and phrasing variations at character level
πŸ“‹ Exact Scoring Equations (PMC12171532)
(a) Base NLP Score Cnlp = min(max(0, 0.15Β·Sj + 0.05Β·Se + 0.15Β·Sc + 0.15Β·Sw), 1)
(b) Confidence Score C = min(max(0, 0.5Β·Stf + 0.5Β·Cnlp), 1)
(c) Final Score Rule F = 0 if Stf<0.2  |  1 if Stfβ‰₯0.9 & Swβ‰₯0.85  |  C otherwise
(d) Stage 2 Score Stage2 = F Γ— MaxMarks    Final = min(MaxMarks, Stage1 + Stage2)
Live Demo

Grade Any Answer β€” Instantly

All computation runs in your browser. Zero data is sent to any server.

✍️ Enter Answers

β€” / 5
Predicted Score
πŸ—ΊοΈ

Concept Coverage Map

Visual Β· Key Concepts
Each bubble = one key concept extracted from the reference answer. Green = covered in your answer, Red = missed. Size = concept importance (TF score).
πŸ’¬

Sentence-Level Attribution

ExASAG Β· BEA 2023
Inspired by ExASAG (Filighera et al., BEA 2023) β€” each sentence in your answer is scored independently. β–  Strong sentences pushed your score up. β–  Weak sentences pulled it down.
πŸ“

What This Score Means

Part 1 Β· Plain English
πŸ“Š

Metric Attributions

Part 2 Β· Feature Impact
β–² Green bars = metric boosted your score above average  Β·  β–Ό Red bars = metric pulled score below average. Hover for details.
Batch Mode

Grade an Entire Class at Once

Upload a CSV β€” get scores for every row in seconds, entirely in-browser.

Required CSV columns (flexible name matching): question   desired_answer (or reference_answer / ideal_answer)   student_answer (or response) β€” matches the Mohler dataset format directly.
πŸ“‚

Drop your CSV here, or click to browse

Supports the Mohler dataset format out-of-the-box.


Class Evaluation

Evaluate Student Summaries vs Meet Transcript

Upload your Google Meet Transcript (.docx) and Student Summaries (.xlsx). The system will automatically refine the transcript into a reference answer and grade each student.

πŸ“„

1. Upload Teacher Script

Provide the Google Meet transcript (.docx)


πŸ“Š

2. Upload Student Summaries

Must contain columns: emailAddress, name, summary (.xlsx)


Usage Guide

Two Ways to Use ExplainGrade

Use the browser instantly, or run locally for heavier workloads and better semantic accuracy.

🌐

Option 1 β€” Use Directly in Browser

No install required. All computation runs in your browser.

1
Single Answer Grading
Go to Live Demo β†’ paste a reference answer and student answer β†’ click Compute Score.
2
Batch CSV Grading
Go to Batch Grade β†’ upload a .csv with columns question, desired_answer, student_answer β†’ results appear instantly.
3
Class Evaluation (Script + Summaries)
Go to Script Eval β†’ upload a Google Meet transcript .docx and student summaries .xlsx β†’ click Run Evaluation.
βœ… Pros: Zero setup, works on any device with a browser, private (no data sent to server).
⚠ Limitation: Uses TF-cosine similarity (in-browser approximation). For best accuracy, use the local runner.
πŸ–₯️

Option 2 β€” Run Locally (Better Accuracy)

Uses real sentence-transformers (all-MiniLM-L6-v2) for semantic similarity.

1
Prerequisites
Python 3.9+ installed. Download Python β†’
2
Clone the repository
git clone https://github.com/ManikeshK1/Explainable_Summary_Score.git
cd Explainable_Summary_Score
3
Create a virtual environment & install dependencies
# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# Install all dependencies
pip install -r requirements.txt
4
Run the local grader
python local_grader.py \
  --docx "path/to/transcript.docx" \
  --xlsx "path/to/summaries.xlsx" \
  --max-score 5
Results are saved to grading_results.csv in the same folder.
5
XLSX format expected
Your summaries spreadsheet must have these columns (names are flexible):
emailAddress  Β·  name  Β·  summary
βœ… Pros: Real sentence-transformers semantic model, processes hundreds of students fast, exports CSV automatically.
⚠ Note: Requires internet on first run to download the ~90 MB model (cached after that).
Research

The Research Behind ExplainGrade

πŸŽ“ Project Overview

ExplainGrade is an automated short-answer grading (ASAG) system built on two published research works and the Mohler Short Answer Grading Dataset β€” a benchmark of 2,273 student responses to Computer Science questions, graded 0–5 by human annotators.

The system addresses the twin problems present in ASAG: length-bias noise (longer answers get artificially inflated scores) and black-box scoring (students receive no actionable feedback). By anchoring every grade to specific, measurable NLP comparisons, every point is traceable.

The explanation layer is directly inspired by ExASAG (Filighera et al., BEA 2023), which introduces sentence-level attributions (SLA) β€” rating individual sentences in the student's response for their contribution to the final grade, giving students precise, actionable feedback.

πŸ“„ Key References

1
Ahmad Ayaan & Kok-Why Ng (2024)
Automated grading using natural language processing and semantic analysis. PMC12171532. β€” Source of the NLP metric weights and scoring equations used in Stage 2.
2
Filighera et al. (2023)
Our System for Short Answer Grading using Generative Models. BEA Workshop, ACL 2023. β€” Source of the sentence-level attribution (SLA) explanation framework.
3
Mohler et al. (2011)
Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. ACL. β€” Dataset used for training and evaluation.

πŸ›  Tech Stack

Python 3.12 scikit-learn RF KeyBERT sentence-transformers all-MiniLM-L6-v2 Levenshtein SHAP pandas / numpy TF Cosine Sim Jaccard Similarity HTML5 / CSS3 Vanilla JS PapaParse Canvas API GitHub Pages

πŸ“Š NLP Metric Reference

Symbol Metric Weight What It Measures
Stf Semantic Similarity 50% TF cosine of full answer vectors β€” proxy for Universal Sentence Encoder contextual meaning
Sj Jaccard Similarity 15% Token-set intersection/union β€” rewards correct domain vocabulary and key terms
Sc Cosine Similarity 15% Word-frequency vector angle β€” captures synonym use and varied phrasing
Sw Norm. Word Count 15% ref keywords Γ· student keywords β€” prevents length inflation bias
Se Edit Similarity 5% Normalised Levenshtein β€” character-level phrasing similarity for typo tolerance
Chat Assistant

Ask Questions About Student Scores

Get instant answers about student performance, statistics, and system explanations using natural language.

πŸ’‘ Try asking:

Who has the highest score?
How many students scored below 0.3?
What's the average score?
Score for John Doe?
How does the system work?
Show all students above 0.4