moderation · ops

Comment moderation that doesn't burn your community

·8 min read

The hard part of comment moderation isn't catching spam. Modern classifiers handle the easy 95%. The hard part is what to do with the borderline cases without grinding your moderators into the ground or turning every approved comment into a debate.

What follows is the playbook we'd want a new site operator to adopt on day one — drawn from running classification at volume across blogs, comment platforms, and forum-style products. Some of it is mechanical (queue routing, threshold choice). Some of it is operational (how moderators stay sharp, how to audit the classifier). Both halves matter; if you only get one right the other one fails.

The three-bucket pattern

Bucket every new comment into one of three statuses on submission:

  • Publish. Score below your low threshold. Goes live immediately. Reader sees it; the author sees it on the next refresh; nothing else happens.
  • Queue. Score between the low and high thresholds. Visible to the author (so they don't think their comment was eaten), invisible to the public, surfaced in the moderator's review queue.
  • Reject. Score above your high threshold. Saved as rejected (don't delete), invisible to everyone including the author.

The reason to use a calibrated probability rather than a binary classifier here is that the buckets are policy choices and you'll want to tune them differently per surface. A developer forum where every comment matters runs conservative thresholds (high low, high high). A marketplace review section where moderator time is the bottleneck runs aggressive thresholds (low low, low high). Same model, two product configurations.

Sensible starting defaults: 0.50 for the low threshold, 0.85 for the high threshold. Most calibrated classifiers will send roughly 80% of comments straight to publish, 15% to queue, and 5% to reject from those numbers. Your queue load is the 15%.

Trusted-author allowlist

Before the classifier runs, check whether the author is on a per-site allowlist. Authors get there by passing a configurable bar: N approved comments without ever being rejected, account age over X days, an explicit moderator promotion. Bypass classification entirely for allowed authors and route them straight to publish.

Two reasons this matters: it saves real money on classification volume (regulars usually generate the bulk of comments), and it keeps the moderator queue focused on actual borderline cases instead of well-known authors who happen to write tersely. The risk — an allowed author turning bad — is mitigated by re-running classification on edit, which we'll get to.

Edge cases that bite

Edits

Re-classify on every edit, no exceptions. The "post clean, edit in spam" pattern is one of the oldest tricks against submit-time-only classification, and it's specifically adapted to bypass platforms that classify only on the initial post. The cost of re-classifying on edit is one API call per edit — for most sites this is rounding error on the volume bill.

Multilingual content

Most production classifiers are trained primarily on English. Non-English comments tend to score artificially high because the model encounters tokens it can't place. Two options: pre-detect language and skip classification for languages the model wasn't trained on (route to queue instead), or raise the thresholds for non-English authors. The first is more defensible; the second is one config line.

Quoted spam

A genuine reply that quotes a spam parent comment will inherit the spam signal — the classifier sees the quoted text and reasonably concludes the whole thing is suspect. Two clean fixes: strip block quotes before classifying, or classify only the new lines added in the reply. The stripped-quote approach is one regex; the new-lines approach requires a diff against the parent and is fiddly enough that it's usually not worth it.

First-comment bias

Brand-new accounts at threshold-adjacent scores deserve more scrutiny than ten-year-old accounts. Stack account age and posting velocity onto the score as a weighted adjustment, not as a hard gate — gating on account age will exclude legitimate first-time commenters who are often the highest-quality contributors.

Queue UX traps

The traps that destroy moderator throughput look small in isolation:

  • No keyboard dispatch. If approving and rejecting requires aiming a mouse, you've cut moderator throughput by an order of magnitude. Single-key dispatch (J/K to navigate, A to approve, R to reject) is the floor.
  • Sort order matters. Sort the queue by descending probability — the most ambiguous-looking comments first. Moderators are sharpest in the first twenty minutes of a session; spend their attention on the borderline cases, not the obvious ones.
  • Show the score, not just the binary. A comment at 0.52 and a comment at 0.84 are both "queued" but want different reviews. Surface the probability as a badge (we use red/amber/grey).
  • Bulk-approve by author. Once a moderator has approved one comment from an author in a session, give them a one-click "approve all from this author" for the queue. You're trusting human judgment to short-circuit machine confidence.

Auditing the classifier

A spam classifier is a moving target. Spam techniques evolve, your audience evolves, and the classifier vendor ships new model versions. Three audit habits keep you out of trouble:

  • Sample false positives weekly. Pull a handful of high-score comments at random and have a human re-judge them. False-positive rate trending up is the earliest signal that thresholds want lowering.
  • Don't delete rejected comments. Soft-delete with the score and timestamp persisted. The audit sample needs the actual text. So does the support reply when a commenter writes in saying their post was eaten.
  • Track score histograms over time. Plot the score distribution per week. A sudden bimodal distribution where there used to be a long tail usually means the classifier got an update; a sudden right-shift usually means a spam wave. Either is worth knowing before a moderator notices through the queue.

What you don't need

A few moderation patterns that get recommended a lot but consistently underperform calibrated classification: pure regex-based blocklists (every blocklist becomes a maintenance graveyard), pure "first comment moderated forever" gates (kills first-time-commenter quality), honeypot fields as a primary defence (good as a layer, weak alone — see the honeypot post), and reCAPTCHA on the comment form (genuine UX tax for marginal benefit on a problem that classification handles server-side).

The shorter version

Three buckets, two thresholds, an allowlist for trusted authors, re-classify on edit, and a moderator queue UX that respects single-keystroke dispatch. Add a weekly false-positive audit and a score histogram to keep the classifier honest. Most of the difficulty in comment moderation is operational, not technical — the model is the smaller half of the problem.

Try Siftfy free

The comments use-case shows the three-bucket pattern as runnable code, and the EchoThread case study documents how a real comment platform wires it end-to-end.