Back to home

Model Metrics · v5 Production

PERFORMANCE

Calibrated Logistic Regression · 46,360 messages · 8,026 features · threshold = 0.47

Accuracy

Out of all messages, how many did we classify correctly?

Precision

Of messages we called SCAM, what fraction really were scams?

Recall

Of all real scams in the dataset, what fraction did we catch?

F1 Score

Balanced average of Precision and Recall — the key overall quality score.

97.39

%

Accuracy

+16.39% from v1 baseline

97.30

%

F1 Score

+13.3% from v1 baseline

99.58

%

AUC-ROC

+10.88% from v1 baseline

97.47

%

Precision

+2.47% from target

97.12

%

Recall

+2.12% from target

17.00

Scam Types

+4% from v4

Signal Analysis · Live Variance

MODEL SIGNALS

Confidence separation, precision–recall tradeoff, and training convergence over time

Confidence Separation

Score deviation from decision threshold (t = 0.47)

97.1%
Scam side
2.4%
Overlap
zero baselineConfidence

Precision–Recall Balance

P − R gap across threshold sweep (zero = balanced)

t = 0.1
High recall
t = 0.9
High prec.
zero baselinePrecision–Recall

Training Convergence

Accuracy gain Δ% per gradient step

81.0%
v1 start
97.4%
converged
zero baselineTraining

Prediction Quality · Threshold Analysis

DIAGNOSTIC CURVES

How model behaviour shifts across operating points and confidence levels

Precision vs Recall

As threshold rises — precision climbs, recall drops

Precision Recall▲ optimal @ 0.47

Confidence Distribution

% of messages per score bucket — scam vs legit

Scam % Legit %

F1 Score by Channel

Detection quality across email, URL, SMS, Reddit

Dashed line = overall accuracy · All channels ≥ 99%

Classifier Quality · Test Set

ROC & CONFUSION

How well the model separates scam from legitimate messages at every threshold

ROC Curve

AUC = 0.9958 · Near-perfect separation

Logistic Regression · calibratedAUC = 0.9958

Confusion Matrix

Predictions on held-out test set (9,272 messages)

Legit

Scam

Legit

Scam

4,675

50.4%

112

1.2%

129

1.4%

4,356

47.0%

Test set · 9,272 messages · threshold = 0.47

Classifier Benchmarking

MODEL COMPARISON

Three classifiers trained on the same feature set — Logistic Regression selected for production

ModelAccuracyPrecisionRecallF1AUC-ROC
Logistic RegressionPROD
97.39%97.47%97.12%97.30%99.58%
Random Forest
97.09%97.01%96.97%96.99%99.32%
Decision Tree
95.91%95.91%95.63%95.77%95.90%

Channel Breakdown

PER-CHANNEL ACCURACY

Detection quality across the four communication channels in the dataset

Accuracy · Precision · Recall · F1 — by Channel

Variable Relationships

FEATURES & DATA

Which signals drive decisions and where the training data comes from

Feature Importance — Top 10

Relative weight of numerical features in the production model

Training Dataset Composition

46,360 messages across 8 data sources — scam vs legit split

Iterative Improvement · v1 → v5

MODEL EVOLUTION

How each pipeline upgrade compounded into a 16.4pp accuracy gain over the baseline

Accuracy & AUC-ROC Progression

Each version adds a new feature tier to the previous one

Coverage · 17 Scam Categories

SCAM TYPE DETECTION

Rule-based type classifier with regex patterns across all known scam vectors

Detection Confidence by Scam Type

Estimated detection rate (%) per category

Coverage Table

All 17 scam types with channel and detection rates

Scam TypeChannelDetection
Phishing
Email / URL
98%
Credential Phishing
Email
97%
Prize Fraud
SMS / Email
99%
Bank Impersonation
SMS / Email
97%
Job Scam
Email / SMS
96%
Investment Scam
SMS / Email
98%
Romance Scam
SMS
95%
Advance Fee
Email
98%
Delivery Scam
SMS
99%
Social Media
SMS / Email
97%
Emergency Scam
SMS
98%
Threat Scam
Email
97%
Pig ButcheringNEW
SMS
95%
QR PhishingNEW
SMS
98%
Refund ScamNEW
Email / SMS
98%
SIM SwapNEW
SMS
98%
General Spam
All
89%