use case · fake registrations

Prevent fake user registrations before they cost you anything.

Updated May 12, 2026

A fake registration isn't one problem — it's five. It pollutes your activation funnel, costs you support time, ages your sender reputation, gives abuse a foothold, and drags down the Stripe risk score on your real customers. "Prevent" is the right word and the wrong number: it's not one filter you turn on. It's three layers that each catch a different shape of fake.

What a fake registration actually costs

Pre-LLM, fake signups were mostly noise. Post-LLM, they're cheap enough to flood with — and varied enough that pattern-based filters miss them. The downstream costs add up faster than the line item "spam accounts in the database" suggests:

  • Funnel pollution. Your activation rate, time-to-first-action, and retention numbers all average over a population that includes accounts that were never going to convert. Decisions made from those dashboards are decisions made from corrupted data.
  • Sender reputation. Welcome emails sent to fake or low-quality addresses bounce, get marked spam, or hit inactive boxes. ESPs (Sendgrid, Postmark, SES) read that as a signal that your domain sends junk, and your real transactional mail starts landing in promotions or spam.
  • Payment risk score. Stripe, Adyen, and Braintree watch the ratio of new accounts that ever attempt payment. A flood of accounts that never reach checkout pulls your risk score the wrong way and affects approval rates for your real customers.
  • Abuse surface. A fake account is a free seat in your app. Coordinated farms create thousands of them to bypass per-account rate limits, scrape, post links, or stage takeover attempts on real users.
  • Support overhead. Real users only file tickets when something is wrong. Every fake account that bounces off a feature, hits a paywall, or trips an automated suspension generates a "what is this" reply you have to read.

Three places to catch them

A single filter at signup-time misses the fakes that don't fill in a bio, and over-blocks the real users who do. The pattern that holds up is three layers stacked, each catching what the previous one couldn't:

  1. Pre-signup

    before the form is posted

    Email reputation, disposable-domain lookups, IP/ASN signal, bot fingerprinting. This isn't Siftfy's job — it's the work of upstream services like Kickbox, EmailListVerify, or your edge WAF. Catches the high-volume, low-effort end of the distribution: known disposable domains, datacenter IPs, automation bots that don't run JS.

  2. At signup

    on form submission, before write

    If your signup captures any free text — display name, bio, company, "how did you hear" — classify it. This catches the farms that beat layer 1 (they rotate IPs, buy real domains) but still need to fill in profile copy that betrays them. Pattern and code in the inline-handler use case.

  3. First activity

    on the first piece of user content

    The strongest signal, and the one most teams skip. A real user's first comment, message, or listing is content the classifier was actually built to score. A fake registration that beat layers 1 and 2 with an empty profile gets caught here, because doing nothing is also a signal — a brand-new account whose first post is a URL plant has shown its hand. This is the layer the webhook below implements.

Webhook pattern

Most auth providers (Auth0, Clerk, Supabase, Cognito, WorkOS) emit a webhook on signup. Subscribing to it gets you signup classification without changing your sign-up flow, your handlers, or your latency budget — the work runs out of band and writes back an account-status field your app reads.

typescript
// Webhook listener for a signup event from your auth provider
// (Auth0, Clerk, Supabase, Cognito — same shape, different envelope).
// The handler runs *after* the account exists, so latency here doesn't
// block the user. We classify whatever free-text fields the signup
// captured and write back the account's status if the score is bad.
import express from "express";
import crypto from "node:crypto";

const app = express();
app.use(express.json({ verify: rawBodyForHMAC }));

const SIFTFY_KEY = process.env.SIFTFY_KEY!;
const SIGNING_SECRET = process.env.AUTH_WEBHOOK_SECRET!;

// Most auth providers HMAC-sign the body. Verify before doing anything.
function rawBodyForHMAC(req: any, _res: any, buf: Buffer) {
  req.rawBody = buf;
}
function verifySignature(req: express.Request): boolean {
  const expected = Buffer.from(
    crypto.createHmac("sha256", SIGNING_SECRET).update((req as any).rawBody).digest("hex"),
  );
  const got = Buffer.from(String(req.header("x-signature") || ""));
  // timingSafeEqual throws if the buffers differ in length — check first,
  // otherwise a malformed or missing signature crashes the worker.
  if (got.length !== expected.length) return false;
  return crypto.timingSafeEqual(expected, got);
}

app.post("/webhooks/signup", async (req, res) => {
  if (!verifySignature(req)) return res.sendStatus(401);

  const { user } = req.body as {
    user: { id: string; email: string; profile?: Record<string, string> };
  };

  // Pull the optional free-text fields the user filled in. Skip structured
  // fields (email, country, plan) — the classifier is trained on natural
  // language, not addresses or enums.
  const text = [
    user.profile?.display_name,
    user.profile?.bio,
    user.profile?.company,
    user.profile?.heard_about_us,
  ]
    .filter(Boolean)
    .join("\n")
    .trim();

  // Empty free-text means no signal yet. Mark the account "unscored" and
  // re-run on first activity — see "Layer 3" below.
  if (!text) {
    await db.users.update(user.id, { spam_status: "unscored" });
    return res.sendStatus(204);
  }

  let probability = 0;
  try {
    const resp = await fetch("https://api.siftfy.io/v1/predict", {
      method: "POST",
      headers: { "Content-Type": "application/json", "X-API-Key": SIFTFY_KEY },
      body: JSON.stringify({ text }),
      signal: AbortSignal.timeout(3000),
    });
    if (resp.ok) ({ spam_probability: probability } = await resp.json());
  } catch {
    // Webhook failures retry. Marking "pending" lets the next attempt
    // try again without flipping a real user into review.
    await db.users.update(user.id, { spam_status: "pending" });
    return res.sendStatus(503);
  }

  // Two thresholds, three resulting states. Tune to your tolerance for
  // false positives.
  const status =
    probability >= 0.95 ? "shadow"        // silently disable
    : probability >= 0.70 ? "review"      // hold for human approval
    : "ok";

  await db.users.update(user.id, { spam_status: status, spam_probability: probability });
  res.sendStatus(204);
});

app.listen(3000);

Three states are usually enough: ok behaves like a normal account; review queues the account for human approval before unlocking any user-visible actions; shadow lets the account use the product normally but suppresses outbound effects (no posts go public, no emails fire, no API tokens issue). Spammers learn nothing from shadow — exactly the point.

Edge cases worth handling

  • Empty profiles aren't a green light. A blank-bio signup isn't necessarily real — it's unclassified. Stamp the account "unscored" and run the classifier on the first piece of content the user produces.
  • Re-score on profile edits. A pattern with sophisticated fakes: clean signup, then a bio edit a day later that adds the promotional payload. Fire the webhook (or an equivalent hook) on profile changes too, not only on signup.
  • Don't double-block. If a user has already paid with a verified card, you've collected a stronger authenticity signal than text classification will give you. Down-weight Siftfy's score for paying customers and lean on chargeback signal instead.
  • Surface the review queue. Every threshold you set has false positives. A queue your admin can flip through in a minute a day is the difference between a system that gets better and a system that quietly costs you real customers. Track approval rate as a metric — if it climbs above ~30% your threshold is too tight.
  • Don't 4xx the form. "Your signup looks like spam" is a tuning signal for the next attempt. Always return a successful response; route blocked accounts into shadow or review on the server instead.
  • Non-English signups. The model is primarily English-trained. If you serve markets where bios are written in other languages, raise your block threshold and lean harder on layer 3 (first-activity content), where the user's actual product behavior is the signal.

What this isn't

Siftfy classifies content. It's not an identity-verification provider — it can't tell you whether the human behind a signup is who they claim to be, only whether the text they produced looks like spam. For KYC, age verification, or sanctions screening, the right tools are Persona, Stripe Identity, Onfido. For volumetric abuse — a script firing a thousand signups a second — pair classification with rate limits at your edge. Siftfy is the layer that catches content-quality fakes that pass every other check.

Get an API key

10,000 classifications / month free, no card. See /v1/predict for the full reference, or related patterns: signups (inline handler), CAPTCHA alternatives, AI-generated spam.