Skip to content

Commit 9d0ee5c

Browse files
feat(speech-generation): add Gradium TTS provider with WebSocket API (withceleste#89)
* feat(speech-generation): add Gradium TTS provider (#XX) Add comprehensive Gradium text-to-speech integration with WebSocket streaming support. Features: - WebSocket-based TTS streaming with low-latency audio generation - 14 flagship voices across 5 languages (en, fr, de, es, pt) - Custom voice cloning with create/list/update/delete operations - Speed control via padding_bonus parameter (-4.0 to 4.0) - Multiple audio formats (wav, pcm, opus, ulaw_8000, alaw_8000, pcm_16000, pcm_24000) - EU/US regional endpoints for optimized latency - Credits monitoring and usage tracking Implementation details: - WebSocket protocol handling with proper message sequencing - Parameter mappers for voice, speed, and response_format - Pydantic models for API responses (VoiceInfo, CreditsSummary, TTSResult) - Full voice management REST API integration - Comprehensive test suite with 6 test functions Dependencies: - Added websockets>=13.0 to support WebSocket connections Documentation: - Complete README with usage examples and API reference - Test scripts for validation (test_gradium_tts.py, test_gradium_minimal.py) * feat(speech-generation): add Gradium TTS provider with WebSocket API Add Gradium as a new speech generation provider using their WebSocket-based Text-to-Speech API. This provider enables high-quality multilingual speech synthesis with 14 flagship voices across 5 languages. Key changes: - Add celeste-gradium provider package with WebSocket TTS client - Add Gradium capability provider with parameter mappers - Add WebSocket client module to celeste core - Add 14 flagship voices (EN, FR, DE, ES, PT) - Add Gradium integration test and CI secrets - Bump celeste-ai and celeste-speech-generation to 0.3.1 Provider features: - WebSocket streaming for low-latency audio generation - Support for wav, pcm, opus output formats - Voice selection by name (Emma, Kent, Sydney, etc.) - Speed control via padding_bonus translation * fix(gradium): fix provider inconsistencies - Export GRADIUM_VOICES in speech-generation __init__.py - Add DEFAULT_VOICE_ID to config.py (consistent with ElevenLabs) - Add _make_request stub to satisfy abstract interface - Remove unused httpx import from capability client * test: improve test coverage for constraints and parameters * test: fix Int constraint test to accept booleans --------- Co-authored-by: maxence <maxencedemonteynard@yahoo.fr>
1 parent ebcf327 commit 9d0ee5c

File tree

27 files changed

+958
-27
lines changed

27 files changed

+958
-27
lines changed

.env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,6 @@ TOPAZLABS_API_KEY=your-topazlabs-api-key-here
3939

4040
# Perplexity
4141
PERPLEXITY_API_KEY=your-perplexity-api-key-here
42+
43+
# Gradium
44+
GRADIUM_API_KEY=your-gradium-api-key-here

.github/workflows/publish.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ jobs:
136136
BYTEPLUS_API_KEY: ${{ secrets.BYTEPLUS_API_KEY }}
137137
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
138138
ELEVENLABS_API_KEY: ${{ secrets.ELEVENLABS_API_KEY }}
139+
GRADIUM_API_KEY: ${{ secrets.GRADIUM_API_KEY }}
139140
run: uv run pytest packages/capabilities/${{ matrix.package }}/tests/integration_tests/ -m integration -v
140141

141142
build:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,3 +157,4 @@ uv.lock
157157

158158
# Security reports
159159
bandit-report.json
160+
mureka.md

packages/capabilities/speech-generation/pyproject.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "celeste-speech-generation"
3-
version = "0.3.0"
3+
version = "0.3.1"
44
description = "Speech generation package for Celeste AI. Unified interface for all providers"
55
authors = [{name = "Kamilbenkirane", email = "kamil@withceleste.ai"}]
66
readme = "README.md"
@@ -17,7 +17,7 @@ classifiers = [
1717
"Topic :: Scientific/Engineering :: Artificial Intelligence",
1818
"Typing :: Typed",
1919
]
20-
keywords = ["ai", "speech-generation", "tts", "text-to-speech", "openai", "google", "elevenlabs", "audio-ai"]
20+
keywords = ["ai", "speech-generation", "tts", "text-to-speech", "openai", "google", "elevenlabs", "gradium", "audio-ai"]
2121

2222
[project.urls]
2323
Homepage = "https://withceleste.ai"
@@ -28,6 +28,8 @@ Issues = "https://github.com/withceleste/celeste-python/issues"
2828
[tool.uv.sources]
2929
celeste-ai = { workspace = true }
3030
celeste-elevenlabs = { workspace = true }
31+
celeste-google = { workspace = true }
32+
celeste-gradium = { workspace = true }
3133
celeste-openai = { workspace = true }
3234

3335
[project.entry-points."celeste.packages"]

packages/capabilities/speech-generation/src/celeste_speech_generation/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,20 @@ def register_package() -> None:
3030
from celeste_speech_generation.providers.google.voices import ( # noqa: E402
3131
GOOGLE_VOICES,
3232
)
33+
from celeste_speech_generation.providers.gradium.voices import ( # noqa: E402
34+
GRADIUM_VOICES,
35+
)
3336
from celeste_speech_generation.providers.openai.voices import ( # noqa: E402
3437
OPENAI_VOICES,
3538
)
3639
from celeste_speech_generation.streaming import SpeechGenerationStream # noqa: E402
3740
from celeste_speech_generation.voices import Voice # noqa: E402
3841

3942
VOICES: list[Voice] = [
43+
*ELEVENLABS_VOICES,
4044
*GOOGLE_VOICES,
45+
*GRADIUM_VOICES,
4146
*OPENAI_VOICES,
42-
*ELEVENLABS_VOICES,
4347
]
4448

4549
__all__ = [

packages/capabilities/speech-generation/src/celeste_speech_generation/models.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,12 @@
55
MODELS as ELEVENLABS_MODELS,
66
)
77
from celeste_speech_generation.providers.google.models import MODELS as GOOGLE_MODELS
8+
from celeste_speech_generation.providers.gradium.models import MODELS as GRADIUM_MODELS
89
from celeste_speech_generation.providers.openai.models import MODELS as OPENAI_MODELS
910

1011
MODELS: list[Model] = [
1112
*GOOGLE_MODELS,
1213
*OPENAI_MODELS,
1314
*ELEVENLABS_MODELS,
15+
*GRADIUM_MODELS,
1416
]

packages/capabilities/speech-generation/src/celeste_speech_generation/providers/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ def _get_providers() -> list[tuple[Provider, type[Client]]]:
1414
from celeste_speech_generation.providers.google.client import (
1515
GoogleSpeechGenerationClient,
1616
)
17+
from celeste_speech_generation.providers.gradium.client import (
18+
GradiumSpeechGenerationClient,
19+
)
1720
from celeste_speech_generation.providers.openai.client import (
1821
OpenAISpeechGenerationClient,
1922
)
@@ -22,6 +25,7 @@ def _get_providers() -> list[tuple[Provider, type[Client]]]:
2225
(Provider.GOOGLE, GoogleSpeechGenerationClient),
2326
(Provider.OPENAI, OpenAISpeechGenerationClient),
2427
(Provider.ELEVENLABS, ElevenLabsSpeechGenerationClient),
28+
(Provider.GRADIUM, GradiumSpeechGenerationClient),
2529
]
2630

2731

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""Gradium provider for speech generation."""
2+
3+
from .client import GradiumSpeechGenerationClient
4+
from .models import MODELS
5+
6+
__all__ = [
7+
"MODELS",
8+
"GradiumSpeechGenerationClient",
9+
]
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""Gradium client implementation for speech generation."""
2+
3+
from typing import Any, Unpack
4+
5+
from celeste_gradium.text_to_speech.client import GradiumTextToSpeechClient
6+
7+
from celeste.artifacts import AudioArtifact
8+
from celeste.parameters import ParameterMapper
9+
from celeste_speech_generation.client import SpeechGenerationClient
10+
from celeste_speech_generation.io import (
11+
SpeechGenerationInput,
12+
SpeechGenerationOutput,
13+
SpeechGenerationUsage,
14+
)
15+
from celeste_speech_generation.parameters import SpeechGenerationParameters
16+
17+
from .parameters import GRADIUM_PARAMETER_MAPPERS
18+
19+
20+
class GradiumSpeechGenerationClient(GradiumTextToSpeechClient, SpeechGenerationClient):
21+
"""Gradium client for speech generation."""
22+
23+
@classmethod
24+
def parameter_mappers(cls) -> list[ParameterMapper]:
25+
return GRADIUM_PARAMETER_MAPPERS
26+
27+
def _init_request(self, inputs: SpeechGenerationInput) -> dict[str, Any]:
28+
"""Initialize request from Gradium API format."""
29+
return {"text": inputs.text}
30+
31+
def _parse_usage(self, response_data: dict[str, Any]) -> SpeechGenerationUsage:
32+
"""Parse usage from response."""
33+
usage = super()._parse_usage(response_data)
34+
return SpeechGenerationUsage(**usage)
35+
36+
def _parse_content(
37+
self,
38+
response_data: dict[str, Any],
39+
**parameters: Unpack[SpeechGenerationParameters],
40+
) -> AudioArtifact:
41+
"""Parse content from response.
42+
43+
Note: This method is not used for Gradium TTS since we override generate()
44+
to handle WebSocket responses. Kept for interface compliance.
45+
"""
46+
msg = "Gradium TTS uses WebSocket, use generate() override"
47+
raise NotImplementedError(msg)
48+
49+
async def generate(
50+
self,
51+
*args: str,
52+
**parameters: Unpack[SpeechGenerationParameters],
53+
) -> SpeechGenerationOutput:
54+
"""Generate speech from text.
55+
56+
Override base generate() to use WebSocket instead of HTTP.
57+
"""
58+
inputs = self._create_inputs(*args, **parameters)
59+
inputs, parameters = self._validate_artifacts(inputs, **parameters)
60+
request_body = self._build_request(inputs, **parameters)
61+
62+
# Use WebSocket TTS flow
63+
audio_bytes, output_format = await self._websocket_tts(request_body)
64+
65+
if not audio_bytes:
66+
msg = "No audio data in response"
67+
raise ValueError(msg)
68+
69+
# Determine MIME type from output_format
70+
mime_type = self._map_output_format_to_mime_type(output_format)
71+
72+
return self._output_class()(
73+
content=AudioArtifact(data=audio_bytes, mime_type=mime_type),
74+
usage=SpeechGenerationUsage(),
75+
metadata=self._build_metadata({}),
76+
)
77+
78+
79+
__all__ = ["GradiumSpeechGenerationClient"]
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
"""Gradium model definitions for speech generation."""
2+
3+
from celeste import Model, Provider
4+
from celeste.constraints import Choice, Range
5+
from celeste_speech_generation.constraints import VoiceConstraint
6+
from celeste_speech_generation.parameters import SpeechGenerationParameter
7+
8+
from .voices import GRADIUM_VOICES
9+
10+
MODELS: list[Model] = [
11+
Model(
12+
id="default",
13+
provider=Provider.GRADIUM,
14+
display_name="Gradium Default TTS",
15+
streaming=False,
16+
parameter_constraints={
17+
SpeechGenerationParameter.VOICE: VoiceConstraint(voices=GRADIUM_VOICES),
18+
SpeechGenerationParameter.OUTPUT_FORMAT: Choice(
19+
options=[
20+
"wav",
21+
"pcm",
22+
"opus",
23+
"ulaw_8000",
24+
"alaw_8000",
25+
"pcm_16000",
26+
"pcm_24000",
27+
]
28+
),
29+
SpeechGenerationParameter.SPEED: Range(min=0.25, max=4.0),
30+
},
31+
),
32+
]
33+
34+
__all__ = ["MODELS"]

0 commit comments

Comments
 (0)