Moderation Policies
Configure what content is acceptable in your application.
What is a Policy?
A moderation policy defines the rules for what content is allowed in your app. Each policy specifies:
- Categories to check (violence, sexual content, hate speech, etc.)
- Thresholds for each category (how sensitive)
- Actions to take when content is flagged
Think of policies as "strictness levels" for your content moderation.
Quick Start
Using a pre-built policy is simple:
<ModeratedTextarea
apiKey="vettly_xxxxx"
policyId="moderate" // ← Choose a pre-built policy
/>Pre-built Policies
Vettly provides ready-to-use policies for common use cases:
Lenient
Use for: Open forums, discussion boards, adult platforms
Sensitivity: Low - Only blocks extreme content
Category Thresholds:
| Category | Threshold | Notes |
|---|---|---|
| Violence/Gore | 0.9 | Extreme violence only |
| Sexual (Explicit) | 0.95 | Hardcore content only |
| Hate Speech | 0.8 | Clear hate only |
| Harassment | 0.9 | Severe harassment |
| Self-harm | 0.7 | Direct encouragement |
Example flagged content:
- Graphic violence depictions
- Hardcore explicit imagery
- Clear calls for violence
Example allowed content:
- Mild profanity
- Political debates
- Artistic nudity
- Gaming violence discussions
<ModeratedTextarea policyId="lenient" />Moderate (Default)
Use for: Social media, community platforms, SaaS apps
Sensitivity: Medium - Balanced approach
Category Thresholds:
| Category | Threshold | Notes |
|---|---|---|
| Violence/Gore | 0.7 | Moderate violence blocked |
| Sexual (Explicit) | 0.75 | Sexual content blocked |
| Sexual (Suggestive) | 0.85 | Suggestive content warned |
| Hate Speech | 0.5 | Sensitive to hate |
| Harassment | 0.6 | Blocks harassment |
| Self-harm | 0.5 | Blocks self-harm content |
| Illegal | 0.6 | Blocks illegal activities |
Example flagged content:
- Violent imagery or descriptions
- Sexual content
- Hate speech or slurs
- Harassment or bullying
- Drug promotion
Example allowed content:
- Mild profanity (context-dependent)
- News articles about violence
- Health/education discussions
- Respectful debates
<ModeratedTextarea policyId="moderate" />Strict
Use for: Kids apps, education platforms, family content
Sensitivity: High - Very strict filtering
Category Thresholds:
| Category | Threshold | Notes |
|---|---|---|
| Violence/Gore | 0.4 | Even mild violence blocked |
| Sexual (All) | 0.5 | All sexual content blocked |
| Hate Speech | 0.3 | Very sensitive |
| Harassment | 0.4 | Zero tolerance |
| Self-harm | 0.3 | Maximum protection |
| Profanity | 0.5 | Blocks swearing |
| Illegal | 0.4 | Strict on illegal content |
Example flagged content:
- Any violence references
- All sexual content
- Profanity
- Cyberbullying
- Dating/romance content
- Controversial topics
Example allowed content:
- Educational content
- Positive messages
- Safe gaming discussions
- Age-appropriate topics
<ModeratedTextarea policyId="strict" />Marketplace
Use for: E-commerce, classifieds, product listings
Sensitivity: Medium-High - Focused on commerce safety
Category Thresholds:
| Category | Threshold | Notes |
|---|---|---|
| Violence | 0.6 | Weapons, dangerous items |
| Sexual | 0.7 | Adult products |
| Illegal | 0.4 | Stolen goods, drugs |
| Scams/Spam | 0.5 | Fraud detection |
| PII Leakage | 0.4 | Personal info protection |
Example flagged content:
- Weapons or explosives
- Counterfeit goods
- Stolen items
- Pyramid schemes
- Contact info in listings
Example allowed content:
- Product descriptions
- Honest reviews
- Pricing information
- Shipping details
<ModeratedImageUpload policyId="marketplace" />Social Media
Use for: Twitter-like platforms, comment systems
Sensitivity: Medium - Optimized for social content
Category Thresholds:
| Category | Threshold | Notes |
|---|---|---|
| Violence | 0.65 | Context-aware |
| Sexual | 0.8 | Allows discussion |
| Hate Speech | 0.45 | Strict on hate |
| Harassment | 0.55 | Protects users |
| Spam | 0.6 | Blocks spam |
| Misinformation | 0.7 | Flags misinfo |
Example flagged content:
- Targeted harassment
- Hate speech
- Spam/scams
- Explicit content
- Coordinated attacks
Example allowed content:
- Political opinions
- News sharing
- Debates (respectful)
- Personal stories
<ModeratedTextarea policyId="social_media" />Category Reference
Violence & Gore
Detects violent content including:
- Physical violence descriptions
- Gore, blood, injuries
- Weapons, explosives
- Animal cruelty
- War/combat content
Threshold guide:
0.3-0.4= Blocks even cartoon violence0.5-0.7= Blocks realistic violence0.8+= Only extreme violence
Sexual Content
Detects sexual content including:
- Explicit: Nudity, sexual acts
- Suggestive: Provocative, flirting
- Adult services, prostitution
- Non-consensual content
Threshold guide:
0.3-0.5= Blocks all sexual references0.6-0.7= Blocks explicit content0.8+= Only hardcore content
Hate Speech
Detects hateful content including:
- Slurs and derogatory terms
- Discrimination based on:
- Race, ethnicity
- Religion
- Gender, sexuality
- Disability
- Supremacist ideology
Threshold guide:
0.3-0.4= Very sensitive (may over-flag)0.5-0.6= Balanced (recommended)0.7+= Only clear hate speech
Harassment & Bullying
Detects harassment including:
- Personal attacks
- Cyberbullying
- Doxxing, stalking
- Threats, intimidation
- Coordinated harassment
Threshold guide:
0.4-0.5= Protective0.6-0.7= Standard0.8+= Severe harassment only
Self-Harm
Detects self-harm content including:
- Suicide discussion/encouragement
- Self-injury
- Eating disorders (promotion)
- Crisis content
Threshold guide:
0.3-0.5= Maximum protection0.6-0.7= Allows support discussions0.8+= Direct encouragement only
Illegal Activities
Detects illegal content including:
- Drug sales/promotion
- Weapons trafficking
- Stolen goods
- Hacking/fraud
- Child safety violations
Threshold guide:
0.4-0.5= Strict enforcement0.6-0.7= Clear violations0.8+= Obvious crimes only
Spam & Scams
Detects spam including:
- Repetitive content
- Phishing attempts
- Pyramid schemes
- Fake giveaways
- Link farms
Threshold guide:
0.4-0.6= Aggressive filtering0.7-0.8= Standard protection0.9+= Obvious spam only
Personal Information (PII)
Detects PII including:
- Email addresses
- Phone numbers
- Social security numbers
- Credit card numbers
- Home addresses
Threshold guide:
0.3-0.5= Maximum privacy0.6-0.7= Standard protection0.8+= Obvious leaks only
Custom Prompt Rules
Go beyond category-based moderation with custom prompt rules. Write natural language prompts that AI evaluates semantically.
When to Use Custom Prompts
Standard categories work for most cases, but custom prompts shine when you need:
- Context-aware decisions: "Is this a weapon? Ignore kitchen knives and toys."
- Domain-specific rules: "Does this food photo show raw or undercooked meat?"
- Nuanced judgments: "Does this product listing show counterfeit goods?"
Example: Counterfeit Detection
name: counterfeit-policy
version: "1.0"
rules:
- category: illegal # Required (any valid category)
threshold: 0.7
action: block
customPrompt: "Does this show counterfeit or fake luxury goods? Look for misspelled brand names, poor quality logos, or suspicious pricing."
customCategory: counterfeit_detectionNote:
categoryis required for schema compatibility but ignored for custom rules—the AI evaluates yourcustomPromptdirectly using Gemini Vision.
How It Works
- Standard rules use AI category detection (violence, sexual, etc.)
- Custom prompt rules send your prompt + image to Gemini Vision
- AI returns a yes/no answer with confidence score
- If confidence exceeds threshold, the action is triggered
Writing Effective Prompts
Be specific about what to detect:
# Good
customPrompt: "Does this show counterfeit luxury goods or fake brand logos?"
# Too vague
customPrompt: "Is this fake?"Specify what to ignore:
# Good - clear exceptions
customPrompt: "Does this show blood? Ignore: ketchup, red paint, Halloween costumes."
# Bad - no context
customPrompt: "Does this show blood?"Keep it under 500 characters:
# Good - concise
customPrompt: "Is this a weapon? Include: guns, knives, explosives. Exclude: kitchen knives, toys, video game screenshots."Tier Limits
| Tier | Custom Prompts per Policy |
|---|---|
| Developer | 1 |
| Growth | 2 |
| Pro | 5 |
| Enterprise | 20 |
Cost
Custom prompt rules use Gemini Vision at ~$0.00012/image - actually cheaper than standard image moderation ($0.0003/image with Hive).
Custom Policies
Create policies tailored to your needs using the Vettly dashboard or API.
Creating a Policy (Dashboard)
- Go to Dashboard → Policies
- Click Create Policy
- Configure categories and thresholds
- Name your policy
- Copy the policy ID
Creating a Policy (API)
import { ModerationClient } from '@vettly/sdk'
const client = new ModerationClient({ apiKey: 'vettly_xxxxx' })
const policy = await client.createPolicy({
name: 'My Custom Policy',
description: 'For my app',
categories: {
violence: {
enabled: true,
threshold: 0.6,
action: 'warn'
},
sexual: {
enabled: true,
threshold: 0.8,
action: 'block'
},
hate: {
enabled: true,
threshold: 0.4,
action: 'block'
}
}
})
console.log('Policy ID:', policy.id)Using Custom Policies
<ModeratedTextarea
apiKey="vettly_xxxxx"
policyId="pol_custom_abc123" // Your custom policy ID
/>Actions
Each category can have a different action:
| Action | Meaning | UI Behavior | Use Case |
|---|---|---|---|
allow | Content is fine | Green, no warning | Safe content |
warn | Minor concerns | Yellow, show warning | Borderline content |
flag | Needs review | Orange, queue for review | Uncertain content |
block | Violates policy | Red, prevent submission | Clear violations |
Action Examples
const policy = {
categories: {
violence: {
threshold: 0.7,
action: 'block' // Hard block
},
profanity: {
threshold: 0.6,
action: 'warn' // Allow with warning
},
spam: {
threshold: 0.5,
action: 'flag' // Queue for review
}
}
}Threshold Tuning
Finding the Right Balance
Too strict (0.3-0.4):
- ❌ Many false positives
- ❌ Users frustrated
- ✅ Maximum safety
Balanced (0.5-0.7):
- ✅ Good accuracy
- ✅ Reasonable UX
- ✅ Recommended
Too lenient (0.8-0.9):
- ✅ Few false positives
- ❌ Misses violations
- ❌ Safety concerns
A/B Testing Policies
const policyId = Math.random() > 0.5 ? 'strict' : 'moderate'
<ModeratedTextarea
policyId={policyId}
onModerationResult={(result) => {
analytics.track('moderation_result', {
policy: policyId,
safe: result.safe,
action: result.action
})
}}
/>Monitoring Policy Performance
Track metrics:
- Precision: True positives / (True positives + False positives)
- Recall: True positives / (True positives + False negatives)
- User appeals: How often users report false positives
- Missed content: How often violations slip through
Use Case Recommendations
Kids App (Age 6-12)
{
policyId: 'strict',
customizations: {
violence: 0.3,
sexual: 0.3,
profanity: 0.4
}
}Teen Social Network (Age 13-17)
{
policyId: 'moderate',
customizations: {
hate: 0.5,
harassment: 0.5,
sexual: 0.7
}
}Adult Forum (18+)
{
policyId: 'lenient',
customizations: {
hate: 0.6,
harassment: 0.7,
illegal: 0.5
}
}Product Reviews
{
policyId: 'moderate',
customizations: {
spam: 0.5,
pii: 0.4,
profanity: 0.7
}
}News Comments
{
policyId: 'moderate',
customizations: {
hate: 0.45,
harassment: 0.55,
violence: 0.8 // Allow news discussion
}
}Best Practices
✅ Do
- Start with pre-built policies
- Monitor false positives/negatives
- Adjust thresholds based on data
- Use different policies for different contexts
- Document your policy choices
- Provide user appeals process
❌ Don't
- Set all thresholds to 0.9 (defeats purpose)
- Set all thresholds to 0.3 (too many false positives)
- Use same policy for kids and adults
- Ignore user feedback
- Change policies without testing
- Skip manual review for flagged content
Multi-Policy Strategy
Use different policies in different contexts:
function CommentSection({ userAge }) {
const policyId = userAge < 13 ? 'strict' : 'moderate'
return (
<ModeratedTextarea
policyId={policyId}
apiKey="vettly_xxxxx"
/>
)
}
function ProductListing() {
return (
<ModeratedImageUpload
policyId="marketplace"
apiKey="vettly_xxxxx"
/>
)
}
function ProfileBio({ isPublic }) {
const policyId = isPublic ? 'strict' : 'moderate'
return (
<ModeratedTextarea
policyId={policyId}
apiKey="vettly_xxxxx"
/>
)
}Testing Policies
Use test content to validate your policy:
const testCases = [
{ content: 'Hello, world!', expect: 'allow' },
{ content: 'You are stupid', expect: 'warn' },
{ content: 'Explicit violence...', expect: 'block' }
]
for (const test of testCases) {
const result = await client.check({
content: test.content,
policyId: 'moderate',
contentType: 'text'
})
console.log(`Expected ${test.expect}, got ${result.action}`)
}See Also
- How It Works - Understand the moderation flow
- API Reference - Create policies programmatically
- Examples - See policies in action
- Dashboard - Manage policies visually
