tool · dataset · 6 specimens

Comment spam examples dataset.

Updated May 12, 2026

A small, human-readable set of patterns you can use to sanity-check moderation queues, reviewer training, and threshold decisions.

01

Link drop

Great post! I wrote more about this at cheap-example-links.example
High risk

Generic praise plus unrelated outbound link.

02

Credential bait

Your account has a security issue. Verify at support-login.example now.
High risk

Urgency, account threat, external login URL.

03

Essay spinner

This article has made many useful points for people in the modern age of business.
Medium risk

Fluent but empty text that does not reference the page.

04

SEO anchor stuffing

best essay writing service | buy backlinks | casino bonus
High risk

Commercial anchors with no sentence structure.

05

Borderline self-promo

We solved this in our open-source plugin; happy to share the repo if useful.
Medium risk

Relevant but promotional. Queue instead of hard block.

06

Clean dissent

I disagree with the recommendation to fall open on outages; in finance we queue all submissions.
Low risk

Specific, topical disagreement with no spam markers.

Common questions

What are the most common types of comment spam?

Link drops, credential bait (phishing), spun essay text, SEO anchor stuffing, and borderline self-promotion. The first four are nearly always block-worthy; borderline self-promo and clean dissent should go to a review queue rather than an automatic block.

How do I test my spam filter with these examples?

Paste each example into your filter — Siftfy's live tester, Akismet's debug endpoint, or your own queue — and confirm the high-risk patterns hit a block threshold while the low-risk one (clean dissent) does not. A filter that blocks the clean example will block real readers too.

Is comment spam still a problem in 2026?

Yes, and harder to spot. LLM-generated essay spam looks fluent enough to bypass keyword rules. The patterns in this dataset are picked specifically because they still occur in production moderation queues every week.

Can I use these examples to train a custom classifier?

The dataset is small (six examples) — too small to train on directly. Use it as a smoke test against an existing classifier or moderator guideline. For training, label your own production comment stream and ensure both spam and non-spam are well represented.