Comment spam examples dataset — Siftfy

Question 1

What are the most common types of comment spam?

Accepted Answer

Link drops, credential bait (phishing), spun essay text, SEO anchor stuffing, and borderline self-promotion. The first four are nearly always block-worthy; borderline self-promo and clean dissent should go to a review queue rather than an automatic block.

Question 2

How do I test my spam filter with these examples?

Accepted Answer

Paste each example into your filter — Siftfy's live tester, Akismet's debug endpoint, or your own queue — and confirm the high-risk patterns hit a block threshold while the low-risk one (clean dissent) does not. A filter that blocks the clean example will block real readers too.

Question 3

Is comment spam still a problem in 2026?

Accepted Answer

Yes, and harder to spot. LLM-generated essay spam looks fluent enough to bypass keyword rules. The patterns in this dataset are picked specifically because they still occur in production moderation queues every week.

Question 4

Can I use these examples to train a custom classifier?

Accepted Answer

The dataset is small (six examples) — too small to train on directly. Use it as a smoke test against an existing classifier or moderator guideline. For training, label your own production comment stream and ensure both spam and non-spam are well represented.

Comment spam examples dataset.

Common questions

What are the most common types of comment spam?

How do I test my spam filter with these examples?

Is comment spam still a problem in 2026?

Can I use these examples to train a custom classifier?