Audio Intelligence

Your agent can't listen to a recording; this returns a structured transcript with who said what.

Audio & Video

Transcribes audio and video recordings into clean text, with optional speaker labeling (who said what) and chapter summaries for long recordings. Supports automatic language detection.

Use when

You have a recording and need its words as text, with speaker turns and optional chapter summaries.

Not for

Live or streaming captioning, translation, naming the actual speakers, or detecting sentiment, entities, or sensitive content.

What you can do

Each is a sub-skill of Audio Intelligence; the router picks the right one for your request.

Transcribe a recording

Turns a recording into a structured transcript, labeling who said what and optionally chaptering it.

~207 credits / hour of audio (up to 5000)

What you get back

information

Returns the full transcript text, the detected language, the recording duration in seconds, speaker-labeled turns when speaker labeling is on, and chapters with summaries when chaptering is on. A long recording may need a few moments and resumes automatically. Not a downloadable file.

Run it

Skills run through one gateway with your Faro token. Hand it an intent in plain language; Faro routes to the right sub-skill, runs it, and bills per call.

curl -X POST "https://skill.askfaro.com/skills/audio-intelligence/run" \
  -H "Authorization: Bearer $FARO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intent":{"prompt":"Transcribe this audio file"}}'

Example requests

›Transcribe this audio file

Create a free account