Convert the spoken words in an audio file to text. Accepts MP3, M4A, WAV, FLAC, OGG, and other common formats. Language is detected automatically, so there is no need to specify it. Returns the full transcript with punctuation and capitalization, along with the audio duration.
You have a recording, voice note, call, meeting, or any audio and want the words as text.
Text-to-speech, translation, or per-speaker diarization.
~2 credits / minute (up to 45)
1.5 credits per minute of audio. A few-minute clip is the usual case.
Estimated; the actual charge depends on your input and is shown in the response.
Set these inside the intent when you run it.
The audio file to transcribe. Provide a URL to the audio file.
The full text transcript of the audio, with punctuation and capitalization, plus the audio duration.
Run this sub-skill directly: pin it with operation and pass its inputs in the intent. (Omit operation and the Speech to Text skill will route from your intent instead.)
curl -X POST "https://skill.askfaro.com/skills/speech-to-text/run" \
-H "Authorization: Bearer $FARO_TOKEN" \
-H "Content-Type: application/json" \
-d '{"intent":{"operation":"transcribe","file":"https://example.com/meeting-recording.mp3"}}'Example requests