> ## Documentation Index
> Fetch the complete documentation index at: https://docs.boson.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create a speech

> Generate speech audio from text. Returns an audio file, or a stream of raw PCM chunks when `stream` is `true`. The body may be JSON or `multipart/form-data` — the latter lets you upload `ref_audio` as a raw file instead of base64-encoding it.



## OpenAPI

````yaml /openapi.json post /v1/audio/speech
openapi: 3.0.3
info:
  title: Boson AI API
  description: REST API for Boson AI audio models.
  version: 1.0.0
  license:
    name: Proprietary
servers:
  - url: https://api.boson.ai
security:
  - bearerAuth: []
paths:
  /v1/audio/speech:
    post:
      tags:
        - Audio
      summary: Create a speech
      description: >-
        Generate speech audio from text. Returns an audio file, or a stream of
        raw PCM chunks when `stream` is `true`. The body may be JSON or
        `multipart/form-data` — the latter lets you upload `ref_audio` as a raw
        file instead of base64-encoding it.
      operationId: createSpeech
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSpeechRequest'
          multipart/form-data:
            schema:
              type: object
              required:
                - input
              properties:
                input:
                  type: string
                  description: Text to convert to speech.
                model:
                  type: string
                  default: higgs-tts-3
                voice:
                  type: string
                  default: default
                response_format:
                  type: string
                  default: mp3
                ref_audio:
                  type: string
                  format: binary
                  description: >-
                    Reference audio file for one-off cloning
                    (AAC/WAV/MP3/FLAC/OPUS), or an http(s) URL string.
                ref_text:
                  type: string
                  description: Recommended transcript of `ref_audio`.
      responses:
        '200':
          description: Generated audio. The content type depends on `response_format`.
          content:
            audio/mpeg:
              schema:
                type: string
                format: binary
            audio/ogg:
              schema:
                type: string
                format: binary
            audio/wav:
              schema:
                type: string
                format: binary
            audio/aac:
              schema:
                type: string
                format: binary
            audio/flac:
              schema:
                type: string
                format: binary
            audio/L16:
              schema:
                type: string
                format: binary
        '400':
          description: Invalid request parameters (e.g. `input_too_long`).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '401':
          description: Missing or invalid API key.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
components:
  schemas:
    CreateSpeechRequest:
      type: object
      required:
        - input
      additionalProperties: false
      properties:
        input:
          type: string
          minLength: 1
          maxLength: 5000
          description: >-
            Text to convert to speech. May contain inline tags. Inputs longer
            than 5000 characters return a 400 `input_too_long`.
          example: Hello, this is a test.
        model:
          type: string
          default: higgs-tts-3
          enum:
            - higgs-tts-3
          description: >-
            TTS model ID / public alias. Resolved to the served model
            server-side.
        voice:
          type: string
          default: default
          description: >-
            Preset voice name or custom voice ID. Mutually exclusive with
            `ref_audio` / `ref_text` when explicitly provided.
        response_format:
          type: string
          enum:
            - mp3
            - opus
            - pcm
            - wav
            - aac
            - flac
          default: mp3
          description: Output audio format. Streaming requires `pcm`.
        stream:
          type: boolean
          default: false
          description: >-
            If true, stream raw PCM chunks as they are decoded. Requires
            `response_format` to be `pcm`. Speed adjustment is not supported
            when streaming.
        ref_audio:
          type: string
          nullable: true
          description: >-
            Inline reference audio for one-off cloning: an http(s) URL, data
            URI, or base64-encoded raw audio bytes. Supported formats: AAC, WAV,
            MP3, FLAC, OPUS. Inline (base64 / data-URI) payloads: max 10 MB.
        ref_text:
          type: string
          nullable: true
          description: Recommended transcript of `ref_audio`.
    Error:
      type: object
      properties:
        error:
          type: object
          properties:
            message:
              type: string
              description: Human-readable error message.
            type:
              type: string
              description: Error category.
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: 'Your Boson API key, sent as `Authorization: Bearer $BOSON_API_KEY`.'

````