Back to home

Model Metrics · v5 Production

PERFORMANCE

Calibrated Logistic Regression · 46 360 messages · 8 026 features · threshold = 0.47

0.00

%

Accuracy

+16.39% from v1 baseline

0.00

%

F1 Score

+13.3% from v1 baseline

0.00

%

AUC-ROC

+10.88% from v1 baseline

0.00

%

Precision

+2.47% from target

0.00

%

Recall

+2.12% from target

0.00

Scam Types

+4% from v4

Signal Analysis · Live Variance

MODEL SIGNALS

Confidence separation, precision–recall tradeoff, and training convergence over time

Confidence Separation

Score deviation from decision threshold (t = 0.47)

97.1%
Scam side
2.4%
Overlap
zero baselineConfidence

Precision–Recall Balance

P − R gap across threshold sweep (zero = balanced)

t = 0.1
High recall
t = 0.9
High prec.
zero baselinePrecision–Recall

Training Convergence

Accuracy gain Δ% per gradient step

81.0%
v1 start
97.4%
converged
zero baselineTraining

Prediction Quality · Threshold Analysis

DIAGNOSTIC CURVES

How model behaviour shifts across operating points and confidence levels

Precision vs Recall

As threshold rises — precision climbs, recall drops

Precision Recall▲ optimal @ 0.47

Confidence Distribution

% of messages per score bucket — scam vs legit

Scam % Legit %

F1 Score by Channel

Detection quality across email, URL, SMS, Reddit

Dashed line = overall accuracy · All channels ≥ 99%

Classifier Quality · Test Set

ROC & CONFUSION

How well the model separates scam from legitimate messages at every threshold

ROC Curve

AUC = 0.9958 · Near-perfect separation

Logistic Regression · calibratedAUC = 0.9958

Confusion Matrix

Predictions on held-out test set (9 272 messages)

Predicted: Legit

Predicted: Scam

Actual:
Legit

Actual:
Scam

4,675

True Negative

50.4%

112

False Positive

1.2%

129

False Negative

1.4%

4,356

True Positive

47.0%

Test set · 9,272 messages · threshold = 0.47

Classifier Benchmarking

MODEL COMPARISON

Three classifiers trained on the same feature set — Logistic Regression selected for production

ModelAccuracyPrecisionRecallF1AUC-ROC
Logistic RegressionPROD
97.39%97.47%97.12%97.30%99.58%
Random Forest
97.09%97.01%96.97%96.99%99.32%
Decision Tree
95.91%95.91%95.63%95.77%95.90%

Channel Breakdown

PER-CHANNEL ACCURACY

Detection quality across the four communication channels in the dataset

Accuracy · Precision · Recall · F1 — by Channel

Variable Relationships

FEATURES & DATA

Which signals drive decisions and where the training data comes from

Feature Importance — Top 10

Relative weight of numerical features in the production model

Training Dataset Composition

46 360 messages across 8 data sources — scam vs legit split

Iterative Improvement · v1 → v5

MODEL EVOLUTION

How each pipeline upgrade compounded into a 16.4pp accuracy gain over the baseline

Accuracy & AUC-ROC Progression

Each version adds a new feature tier to the previous one

Coverage · 17 Scam Categories

SCAM TYPE DETECTION

Rule-based type classifier with regex patterns across all known scam vectors

Detection Confidence by Scam Type

Estimated detection rate (%) per category

Coverage Table

All 17 scam types with channel and detection rates

Scam TypeChannelDetection
Phishing
Email / URL
98%
Credential Phishing
Email
97%
Prize Fraud
SMS / Email
99%
Bank Impersonation
SMS / Email
97%
Job Scam
Email / SMS
96%
Investment Scam
SMS / Email
98%
Romance Scam
SMS
95%
Advance Fee
Email
98%
Delivery Scam
SMS
99%
Social Media
SMS / Email
97%
Emergency Scam
SMS
98%
Threat Scam
Email
97%
Pig ButcheringNEW
SMS
95%
QR PhishingNEW
SMS
98%
Refund ScamNEW
Email / SMS
98%
SIM SwapNEW
SMS
98%
General Spam
All
89%

System Architecture

HOW IT WORKS

9-stage inference pipeline from raw text to calibrated verdict

01

Preprocess

Unicode · emoji · HTML · l33t

02

Tone Score

Urgency · Fear · Reward · Threat

03

URL Check

TLD · keywords · IP · lookalike

04

Phrase Match

217 scam phrases (exact)

05

TF-IDF

5 000 word + 3 000 char n-grams

06

FAISS

k=10 scam vector proximity

07

Inference

LR · 8 026 features

08

Calibrate

Isotonic regression

09

Verdict

SCAM / SUSPICIOUS / LEGIT