use case · ai-generated spam
Detect AI-generated spam comments.
Updated May 12, 2026
The cheap-content economy means a spammer can produce a thousand grammatical, on-topic-sounding comments for the price of a coffee. That's broken every spam filter built around "looks suspicious" surface heuristics. The fix: classify on intent, not authorship. Siftfy doesn't care whether a human or an LLM wrote a comment — it cares whether the content is doing the work of spam.
Why AI-generated spam beats keyword filters
Pre-LLM spam was easy to spot because it was sloppy: broken grammar, obvious template fragments, the same opening sentence across thousands of submissions, links to obviously sketchy domains. Filters trained on those signals — Akismet, naive regex rules, banned-word lists — caught it.
AI-written spam isn't sloppy. It's grammatical, contextually plausible, and varied across submissions. The opening sentences are different every time. The domain might be a real-looking word combo (exampleshop.biz) instead of obvious keyword stuffing. The comment "engages" with the post before pivoting to the promotional payload, mimicking the shape of a real reader.
What's left in common across these comments isn't surface features — it's the underlying shape of promotional content: an unprovoked URL plant, a topic shift halfway through, a recommendation pattern that wouldn't appear in a real conversation. That's what a content-trained classifier picks up, because that's what the training data was selected to learn.
Two comments, one promotional
// Two real comments. Both pass naive filters; one is spam.
const a = "Great post! I've been struggling with this exact issue for weeks. " +
"Quick question: did you find that the GC overhead got worse under " +
"load, or stayed roughly flat? Curious because I see different " +
"behavior on Java 21 vs 17.";
const b = "Great post! Very informative. I would like to share my own " +
"experience with this. I run a small business and recently " +
"discovered an amazing solution at exampleshop.biz that " +
"changed everything for us. Highly recommend checking it out.";
// Both are grammatical. Both are polite. Filter b on the link domain
// alone and you'll miss the next variant. Filter on the *shape* — the
// shift from on-topic engagement to a promotional URL plant — and the
// model classifies independently of whether a human or an LLM produced it.
const probA = await predict(a); // ~0.04 — clean engagement
const probB = await predict(b); // ~0.93 — promotional intentBoth are well-formed. Both could plausibly be human or LLM. The difference is what the comment is doing: A is a question that engages with the post's claims; B is a soft pivot from engagement to a URL plant. The probability gap reflects intent, not authorship.
Practical guidance
- Don't try to detect "AI-written" specifically. Tools that promise to flag AI authorship are unreliable, and getting more unreliable every model release. Worse, they penalize legitimate users who use AI assistance to write polished comments. Classify the content's behavior, not its origin.
- Threshold tuning matters more. AI-written spam scores in the 0.7-0.9 range more often than rule-violating spam (which clusters near 1.0). If you previously ran with a block threshold of 0.95, drop it to 0.85 and use the 0.5-0.85 band as a review queue. Calibration means those numbers have a real interpretation: at 0.7, roughly 70% of comments at that score are spam.
- Re-classify on edits. A pattern with AI spam: a clean comment that gets edited to add the URL after initial moderation. Re-fire the classifier on every edit, not just on first submission.
- Pair with rate limits. AI generation is cheap, and that means volume goes up. A single source posting 50 grammatical comments in five minutes is suspicious in aggregate even if each individual comment is borderline. Per-IP per-minute limits at the edge are the right complement to per-comment classification.
What we don't claim
Siftfy is a calibrated content classifier. It doesn't identify AI-generated content as a category — there's no "this was probably written by an LLM" output, and we wouldn't trust one if there were. What it does well is score the content on the dimensions that actually correlate with spammy behavior, and those dimensions generalize across LLM-written and human-written spam alike. If you need authorship attribution specifically, that's a different tool than this one.
10,000 classifications / month free. Read the /v1/predict reference, or peek at related use cases: comments, CAPTCHA alternatives, contact forms.