# Speech to Text

> Skill `speech-to-text` on Faro. 1 sub-skill.

Transcribe spoken audio to text from any common audio file, with automatic language detection, punctuation, and capitalization. Works with recordings, voice notes, calls, meetings, podcasts, and more. Priced per minute of audio, with no per-file minimums.

**Category:** Audio & Video  
**Tags:** speech-to-text, transcription, transcribe, audio, stt, multilingual  
**Use when:** You have an audio file and need its spoken words as text.  
**Not for:** Generating speech from text, translating, separating speakers, or analyzing non-speech audio.  
**Returns:** information — Returns the transcript plus the audio duration. Billing is 1.5 credits per minute of audio.

## How to run
Skills run through one gateway with your Faro token. Hand it an `intent` in plain language; Faro routes to the right sub-skill, runs it, and bills per call. Raw tools are internal plumbing and are not directly callable.

```
POST https://skill.askfaro.com/skills/speech-to-text/run
Authorization: Bearer faro_<your_key>
Content-Type: application/json

{"intent":{"prompt":"Transcribe this voice note for me."}}
```

Or from the CLI:

```bash
pip install askfaro-cli && askfaro auth login
askfaro run speech-to-text "Transcribe this voice note for me."
```

Full run reference: https://askfaro.com/llms/run.md — Agent recipe: https://askfaro.com/llms/skill.md

## Example requests

- Transcribe this voice note for me.

## Sub-skills

### Transcribe audio

Transcribes the speech in an audio file to text, multilingual.

**Cost:** ~2 credits / minute (up to 45) — 1.5 credits per minute of audio. A few-minute clip is the usual case.

**Use when:** You have a recording, voice note, call, meeting, or any audio and want the words as text.

**Details:** https://askfaro.com/llms/skills/speech-to-text/transcribe.md

---

---
On the web: https://askfaro.com/search/speech-to-text