Voice Chat
Real-time voice conversations with agents via Gemini 2.5 Flash Native Audio model (~280ms latency). Gemini handles speech-to-speech; Claude Code remains the agent's reasoning engine and is invoked on demand via tool calling.
Concepts
run_task) — During a voice session, Gemini can delegate complex tasks to the underlying Claude agent. The orb shows an amber badge while the task runs.voice-agent-system-prompt.md in the container → auto-generated from template info → generic fallback.How It Works
Open an agent's Chat tab.
Click the microphone button next to the chat input.
A full-screen voice overlay appears with an animated canvas orb.
Speak — audio is captured as PCM 16 kHz and streamed to the backend WebSocket.
The backend proxies audio to the Gemini Live API in real-time.
Agent response audio (PCM 24 kHz) plays back immediately (~280ms TTFT).
When Gemini needs to perform a complex task, it calls run_task: the orb shifts to an amber badge state. Trinity sends the prompt to the Claude agent (up to 30 seconds). Gemini speaks the result when done; the orb returns to listening state.
Click End to close the session. Transcripts are saved to the current chat session.
Orb State Reference
| State | Orb color | Trigger |
|---|---|---|
| Idle / Connecting | Base hue (0°) | Before audio starts |
| Listening | +90° shift (green) | Microphone active, user speaking |
| Speaking | +210° shift (indigo) | Gemini responding |
| Tool calling | Amber badge overlay | run_task dispatched to Claude |
Click Mute to silence microphone mid-session. Gemini continues speaking.
Requirements
GEMINI_API_KEY configured in Settings → AI KeysVOICE_ENABLED must be on (default: on when API key is present)Configuration
| Variable | Description | Default |
|---|---|---|
| GEMINI_API_KEY | API key for Gemini Live API | — (required) |
| VOICE_ENABLED | Global toggle | true |
| WORKSPACE_ENABLED | Enable the Workspace Mode canvas (BETA, admin opt-in) | false |
| VOICE_MODEL | Gemini model ID | gemini-2.5-flash-native-audio-preview-12-2025 |
| VOICE_MAX_DURATION | Max session duration in seconds | 300 |
Per-Agent Voice Prompt
Set a custom voice system prompt by placing voice-agent-system-prompt.md in /home/developer/. Controls Gemini's persona independently of CLAUDE.md. If no file is present, Trinity auto-generates a prompt from template info.
Tool Calling
When Gemini encounters complex requests, it calls run_task:
Gemini formulates a task prompt (max 2000 chars)
Trinity dispatches the prompt to the Claude agent
Agent runs with full tool access
Result returned to Gemini
If agent is unreachable or times out (30s), Gemini recovers gracefully
All run_task invocations are written to the platform audit log.
Workspace Mode BETA
Workspace Mode is a full-page voice surface with a live canvas beside the orb. While you talk, the agent can paint the canvas with diagrams, images, and formatted text — useful for walkthroughs, design reviews, and any conversation where a picture helps. It is opt-in and admin-gated, off by default.
Enabling Workspace Mode
Workspace Mode is hidden unless an admin enables it platform-wide via WORKSPACE_ENABLED (default false). The button only appears when workspace_available is true, which requires both voice to be available (VOICE_ENABLED + GEMINI_API_KEY) and WORKSPACE_ENABLED=true.
How It Works
On the Agent Detail page (agent must be running), click Workspace in the header — it carries an amber BETA badge.
The browser opens the full-page workspace at /agents/{name}/workspace: the animated orb and controls on the left, the canvas on the right.
Start talking. The voice session behaves exactly like standard mode — same orb states, same run_task delegation to Claude.
When the agent decides a visual helps, it calls a panel tool. The canvas updates within ~300ms.
Panel Tools
The agent drives the canvas with these in-session tools (resolved inside Trinity — they never run in the agent container):
| Tool | Effect on the canvas |
|---|---|
| show_markdown | Render formatted text (headings, lists, tables) |
| show_diagram | Render a Mermaid diagram (flowcharts, sequence diagrams, etc.) |
| show_image | Show an image — a web URL or a file from the agent's workspace |
| update_panel | Replace the canvas with an HTML layout |
| append_to_panel | Add content to the current panel |
| clear_panel | Empty the canvas |
Panel History
The canvas keeps a 40-snapshot history. Use the prev/nextcontrols or the dropdown to step back through what was shown earlier in the conversation. "Live" follows the newest snapshot; navigating back pins the view until a new update arrives.
Rendering & Safety
All canvas content is sanitized before display (DOMPurify, the same trust model as every other markdown surface on the platform):
..), absolute escapes, and non-http schemes (data:, etc.) are rejected.<script> tags are stripped, so agent-supplied JavaScript (e.g. Chart.js) does not execute. Use show_diagram for dynamic visuals instead.API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| /api/agents/{name}/voice/start | POST | Start a voice session — pass workspace_mode: true for canvas mode. Returns voice_session_id and WebSocket URL |
| /api/agents/{name}/voice/stop | POST | Stop a voice session — returns transcript and cost |
| /api/agents/{name}/voice/status | GET | Get session status |
| /api/agents/{name}/voice/{session_id}/panel | GET | Current workspace canvas state (type, content, title, updated_at); polled by the canvas |
| /api/agents/{name}/voice/prompt | GET / PUT | Read or set the per-agent voice system prompt |
| /ws/voice/{session_id} | WebSocket | Bidirectional audio bridge |
WebSocket Message Types
Client → Server
{ "type": "audio", "data": "<base64 PCM 16kHz audio>" }Server → Client
audio — PCM 24 kHz response audio chunktranscript — Incremental or final transcript textstatus — Session state change (listening, speaking, idle)tool_call — run_task dispatched to Claudetool_result — Claude agent response returned to GeminiLimitations
VOICE_MAX_DURATION).run_task tool calls time out after 30 seconds.