Skip to content

Create speech

POST https://api.fastapi.ai/v1/audio/speech

Generates audio from the input text.

Request body


model string Required
One of the available TTS models: gpt-4o-mini-tts, gpt-4o-mini-tts-2025-12-15, tts-1, or tts-1-hd.


input string Required
The text to generate audio for. The maximum length is 4096 characters.


voice string or object Required
The voice to use when generating the audio.

Supported built-in voices are alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse, marin, and cedar. You can also pass a custom voice object created via /v1/audio/voices.

custom voice object

id string Required
The ID of a custom voice created via /v1/audio/voices.


instructions string Optional
Additional instructions to control the voice style (for example: accent, speaking rate, emotion).
This parameter is only supported for gpt-4o-mini-tts and gpt-4o-mini-tts-2025-12-15.


response_format string Optional Defaults to mp3
The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.


stream_format string Optional Defaults to audio
The format of the streamed response. If set to audio, the response will be audio bytes. If set to json, the response will be an event stream.


speed number Optional Defaults to 1
The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Returns


Returns the audio file content or a stream of audio events.

Example

Request

bash
curl https://api.fastapi.ai/v1/audio/speech \
  -H "Authorization: Bearer $FAST_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

那年我双手插兜, 让bug稳如老狗