Tools/content-moderation
Content Moderation

Content Moderation

Active

Your agent has no reliable way to judge if content is unsafe; this scores text and images by category before you act on them.

1 tool

Classify text or images for unsafe content (sexual, hateful, harassing, self-harm, violent, and more) and get per-category scores plus an overall flag. A safety check an agent can run before showing, storing, or acting on user-generated content.

Imagesmoderationsafetynsfwcontentclassification

Tools (1)

Classify text and/or images for unsafe content; returns per-category flags and scores. Pass input as a string for text, or as an array of {type:text,...} / {type:image_url,...} objects to moderate images.

0.1 credits/call ($0.0001) · 0.1 credit per moderation call

Example prompts

  • Is this comment safe to post?
  • Check whether this image contains explicit content
  • Moderate this user review for hate speech
  • Score this message across harm categories

Parameters

inputrequired

Text to classify (a string), or an array of typed parts for multimodal input, e.g. [{"type":"text","text":"..."},{"type":"image_url","image_url":{"url":"https://..."}}].

modelstringoptionaldefault: "omni-moderation-latest"

Moderation model. omni-moderation-latest is multimodal (text + images); text-moderation-latest is text-only.

API Usage

curl -X POST "https://skill.askfaro.com/skills/content-moderation/run" \
  -H "Authorization: Bearer faro_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
  "intent": {
    "prompt": "Is this comment safe to post?"
  }
}'

CLI Usage

askfaro describe content-moderation/moderate

Install pip install askfaro-cli, then askfaro auth login.

README

Content Moderation

Multimodal content classification via OpenAI's moderation model. Pass text, an image, or both, and get back per-category flags and confidence scores.

How it works

Call moderate with input. The simplest form is a plain string:

{ "input": "is this message ok to post?" }

To moderate an image (or mix text and images), pass input as the array form OpenAI expects:

{ "input": [
  { "type": "text", "text": "caption to check" },
  { "type": "image_url", "image_url": { "url": "https://.../photo.jpg" } }
] }

Output

The response mirrors OpenAI's moderation result: a results array, each with flagged (boolean), categories (per-category booleans), and category_scores (0-1 confidence per category).

Policy

This tool reports a verdict. It does not block anything on its own: the calling agent decides what to do with a flag. Use it as a guardrail before showing, storing, or acting on user-generated content.

Pricing

Flat 0.1 credit per call (the upstream is free; this covers overhead).