← All skills

Speech to Text

Transcribe speech to text from any audio file, multilingual, on Faro's own infrastructure and priced below the major transcription APIs.

Audio & Video

Transcribe spoken audio to text from any common audio file, with automatic language detection, punctuation, and capitalization. Works with recordings, voice notes, calls, meetings, podcasts, and more. Priced per minute of audio, with no per-file minimums.

Use when

You have an audio file and need its spoken words as text.

Not for

Generating speech from text, translating, separating speakers, or analyzing non-speech audio.

What you can do

Each is a sub-skill of Speech to Text; the router picks the right one for your request.

What you get back

information

Returns the transcript plus the audio duration. Billing is 1.5 credits per minute of audio.

Run it

Skills run through one gateway with your Faro token. Hand it an intent in plain language; Faro routes to the right sub-skill, runs it, and bills per call.

curl -X POST "https://skill.askfaro.com/skills/speech-to-text/run" \
  -H "Authorization: Bearer $FARO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intent":{"prompt":"Transcribe this voice note for me."}}'

Example requests

  • Transcribe this voice note for me.