ai spam · moderation

How to detect AI-generated spam comments

·8 min read

The honest answer to "can you tell a single AI-written comment from a human one?" is: not reliably, and not on the content alone. LLM output is fluent enough that scanning a single comment is no longer a useful test. The good news is that spam detection has never been about a single comment in isolation — it's about the shape of the submission relative to the page, the author, the surrounding text, and the recent submission history. That shape still leaks information, and it's where the detection signal lives.

What changed: LLMs killed the keyword filter

A keyword filter works by listing terms strongly correlated with spam ("essay writing service", "casino bonus", "buy backlinks"). Pre-LLM spammers had to use those exact terms because they were running templates. LLM-generated spam paraphrases. The submission lands on the page as fluent prose that mentions the topic in passing, includes an off-target link, and never trips the keyword rule.

One real example from a recent moderation queue: a comment on a coffee-brewing post that reads "This is a wonderful explanation of how brewing temperature affects extraction. I've been exploring similar ideas in my own writing at [example domain]." The comment is grammatically perfect. There are no spam keywords. The link is irrelevant. A keyword filter sees nothing; a structural classifier sees several things.

The structural signals that still work

The patterns we score in production:

  • Generic-praise opener. "Great post!", "I really enjoyed reading this", "Wonderful explanation" followed by content that doesn't engage with any specific claim or detail. Humans almost always quote, disagree with, or extend a specific point. LLM spam usually doesn't.
  • On-topic-but-empty body. The comment uses vocabulary from the post (so it passes a topical-relevance check) but doesn't contribute anything beyond restating broad themes. Information density is low.
  • Outbound link mismatch. The link target's content doesn't relate to the post or the comment. Topically off-target links in fluent comments are one of the strongest single signals.
  • Submission timing. LLM spam tends to arrive in bursts on the same post across multiple author identities. Comment-to-comment timing under 30 seconds from different IPs but consistent stylistic fingerprint is a strong cluster signal.
  • Author URL pattern. The author's website field points to a recently-registered domain, a free subdomain provider, or a known link farm. Cheap to check, high precision.

Why probability-based classifiers handle this better

A binary spam/clean filter has to draw a hard line on ambiguous content. Most AI-generated comments are genuinely ambiguous — they sit between obvious template spam and obvious genuine engagement. Forcing a binary decision guarantees you'll either pass too much spam or block too much real content.

A calibrated probability lets you split the decision into three buckets:

  • Above 0.85: drop without notification. High-confidence spam, almost always template or cluster-pattern.
  • 0.50 – 0.85: hold for review. This is where most AI-generated comments land. A human moderator takes a few seconds to publish or reject; the queue stays short because the band is narrow.
  • Below 0.50: publish. Clean human and obvious AI-assisted-but-real engagement both fall here.

What doesn't work reliably

Two approaches that get pitched a lot:

AI-text detectors. Tools that claim to identify "AI-written" content have improved, but their false positive rates on short comments are too high for production moderation. They also have a fairness problem: non-native English writers and stylistically clean human writers get flagged at meaningfully higher rates. Don't rely on them.

Perplexity-based scoring alone. Lower perplexity (text that's "easier for a language model to predict") correlates loosely with LLM authorship, but the correlation is too weak to act on. Real human writing also spans a wide perplexity range.

The production setup

For the comments use case, the integration shape that handles AI spam well in production is the three-bucket pattern with a small twist: send the comment text, the post URL, and the author URL together. The classifier uses all three. A clean comment on a relevant post with a clean author URL scores below 0.5; an LLM-generated paraphrase with an unrelated link scores above 0.85; everything in between goes to the review queue.

Per-page false positive rates from teams running this pattern: 0.5–2%. That's a number a small moderation team can audit weekly. Without the structural signals, the same AI spam typically slips through at 30–50% — the difference between a clean comments section and a slowly-rotting one.

Common questions

How can I tell if a comment was written by an AI?

On a per-comment basis you usually can't be certain — modern LLM output is fluent enough to pass a casual read. What you can detect is whether the comment is generic, on-topic-but-empty, or links to a destination unrelated to the post. Those structural cues are reliable; specific phrase detection is not.

Do AI-generated comments hurt SEO?

Yes, through the same mechanisms as any other spam: crawl-budget bleed, the User-Generated Spam manual action, link-equity loss, and E-E-A-T trust erosion. The mechanism doesn't care whether the spam was written by a human or a model.

Are keyword filters useful against LLM-generated spam?

Largely no. LLMs avoid the obvious commercial anchors that keyword rules target, and they paraphrase around blocklists. Keyword filters still catch low-effort spam, but the share they catch is shrinking fast as AI-assisted spam grows.

What does a probability-based classifier add over a rule-based one?

A calibrated classifier (like Siftfy) scores on the full content shape — structural cues, semantic coherence with the post, link patterns — not just word presence. That generalizes to LLM-generated text in ways a keyword list can't, and the score is tunable per threshold.

How do I handle borderline AI-generated comments?

Send them to a review queue rather than an automatic block. AI-generated comments span a spectrum from clearly-spam to genuinely-engaged user using AI as a writing tool. Auto-blocking the entire range is too aggressive; the queue lets a human draw the line.

Try Siftfy free

See the AI-spam use case for the integration shape, or paste a comment into the live tester.