Content Moderation at Scale: Lessons from Building a 100-Person Trust & Safety Team
Building a content moderation operation that's accurate, consistent, and doesn't burn out your team is harder than it looks. Here's what works.
The scale problem is a quality problem
Content moderation looks straightforward at small scale: have someone review reported content and make decisions. At 100,000 items per day, the challenges multiply. Decision consistency degrades because different reviewers make different calls. Reviewer burnout increases error rates. Edge cases accumulate faster than policy updates can address them.
Every content moderation failure at scale — over-removal of legitimate content, under-removal of harmful material — traces back to either inconsistency or inadequate support for reviewers handling difficult material.
Policy clarity is the foundation
Every content moderation error is, at root, either a policy ambiguity or a training failure. If reviewers are making inconsistent decisions on the same type of content, the policy isn't clear enough — or the training didn't make it clear enough.
Effective moderation policies have three layers: bright lines (clear violations, always removed), contextual judgments (requires applying policy to context), and edge case guidance (worked examples of difficult cases with rationale). The worked examples are the most valuable and most often absent.
The calibration process
Consistency in moderation decisions requires active calibration. At minimum monthly, QA analysts should distribute a set of 50–100 items to all reviewers independently. The distribution of decisions is analysed — where consensus is high, policy is clear; where consensus is low, policy needs clarification.
This calibration process should feed directly into policy updates and training refreshes. Without it, decision quality drifts over time as reviewers' individual interpretations diverge.
Reviewer wellbeing is an operational issue
Reviewers handling graphic or disturbing content — violence, abuse, extremism — experience measurable psychological impact. This isn't a soft HR concern; it's an operational issue that directly affects accuracy and retention.
Effective wellbeing programmes include: rotation away from the most difficult content categories, mandatory breaks, access to psychological support, and clear limits on daily exposure duration. Operations that treat wellbeing as optional experience higher churn, higher error rates during burnout periods, and eventually, reputational problems when the working conditions become public.
AI as a triage layer, not a replacement
At scale, AI moderation handles the volume that human teams can't. But as discussed in the hybrid moderation framework, AI struggles with context and novelty. The most effective large-scale operations use AI for volume triage and humans for review of anything flagged, borderline, or novel.
The ratio of AI-to-human review varies by content type: for nudity and spam, AI handles 90%+ accurately. For hate speech and misinformation, human review remains essential for anything beyond clear-cut cases.