Model Metrics · v5 Production
PERFORMANCE
Calibrated Logistic Regression · 46,360 messages · 8,026 features · threshold = 0.47
Accuracy
Out of all messages, how many did we classify correctly?
Precision
Of messages we called SCAM, what fraction really were scams?
Recall
Of all real scams in the dataset, what fraction did we catch?
F1 Score
Balanced average of Precision and Recall — the key overall quality score.
97.39
%Accuracy
+16.39% from v1 baseline
97.30
%F1 Score
+13.3% from v1 baseline
99.58
%AUC-ROC
+10.88% from v1 baseline
97.47
%Precision
+2.47% from target
97.12
%Recall
+2.12% from target
17.00
Scam Types
+4% from v4
Signal Analysis · Live Variance
MODEL SIGNALS
Confidence separation, precision–recall tradeoff, and training convergence over time
Confidence Separation
Score deviation from decision threshold (t = 0.47)
Precision–Recall Balance
P − R gap across threshold sweep (zero = balanced)
Training Convergence
Accuracy gain Δ% per gradient step
Prediction Quality · Threshold Analysis
DIAGNOSTIC CURVES
How model behaviour shifts across operating points and confidence levels
Precision vs Recall
As threshold rises — precision climbs, recall drops
Confidence Distribution
% of messages per score bucket — scam vs legit
F1 Score by Channel
Detection quality across email, URL, SMS, Reddit
Dashed line = overall accuracy · All channels ≥ 99%
Classifier Quality · Test Set
ROC & CONFUSION
How well the model separates scam from legitimate messages at every threshold
ROC Curve
AUC = 0.9958 · Near-perfect separation
Confusion Matrix
Predictions on held-out test set (9,272 messages)
LegitPredicted: Legit
ScamPredicted: Scam
LegitActual:
Legit
ScamActual:
Scam
4,675
50.4%
112
1.2%
129
1.4%
4,356
47.0%
Test set · 9,272 messages · threshold = 0.47
Classifier Benchmarking
MODEL COMPARISON
Three classifiers trained on the same feature set — Logistic Regression selected for production
| Model | Accuracy | Precision | Recall | F1 | AUC-ROC |
|---|---|---|---|---|---|
Logistic RegressionPROD | 97.39% | 97.47% | 97.12% | 97.30% | 99.58% |
Random Forest | 97.09% | 97.01% | 96.97% | 96.99% | 99.32% |
Decision Tree | 95.91% | 95.91% | 95.63% | 95.77% | 95.90% |
Channel Breakdown
PER-CHANNEL ACCURACY
Detection quality across the four communication channels in the dataset
Accuracy · Precision · Recall · F1 — by Channel
Variable Relationships
FEATURES & DATA
Which signals drive decisions and where the training data comes from
Feature Importance — Top 10
Relative weight of numerical features in the production model
Training Dataset Composition
46,360 messages across 8 data sources — scam vs legit split
Iterative Improvement · v1 → v5
MODEL EVOLUTION
How each pipeline upgrade compounded into a 16.4pp accuracy gain over the baseline
Accuracy & AUC-ROC Progression
Each version adds a new feature tier to the previous one
Coverage · 17 Scam Categories
SCAM TYPE DETECTION
Rule-based type classifier with regex patterns across all known scam vectors
Detection Confidence by Scam Type
Estimated detection rate (%) per category
Coverage Table
All 17 scam types with channel and detection rates
| Scam Type | Channel | Detection |
|---|---|---|
Phishing | Email / URL | 98% |
Credential Phishing | 97% | |
Prize Fraud | SMS / Email | 99% |
Bank Impersonation | SMS / Email | 97% |
Job Scam | Email / SMS | 96% |
Investment Scam | SMS / Email | 98% |
Romance Scam | SMS | 95% |
Advance Fee | 98% | |
Delivery Scam | SMS | 99% |
Social Media | SMS / Email | 97% |
Emergency Scam | SMS | 98% |
Threat Scam | 97% | |
Pig ButcheringNEW | SMS | 95% |
QR PhishingNEW | SMS | 98% |
Refund ScamNEW | Email / SMS | 98% |
SIM SwapNEW | SMS | 98% |
General Spam | All | 89% |