Advanced Features
Voice chat, image generation, agent avatars, BPMN-inspired process engine workflows, and agent-defined dynamic dashboards.
Voice Chat
Real-time voice conversations with agents via Gemini 2.5 Flash Native Audio model (~280ms latency). Audio streams bidirectionally through a backend WebSocket proxy.
Open an agent's Chat tab.
Click the microphone button.
A voice overlay appears with status, mute, and end controls.
Speak — audio is captured as PCM 16kHz and streamed to the backend WebSocket.
The backend proxies audio to the Google Gemini Live API.
Agent response audio (PCM 24kHz) plays back in real-time.
Transcripts are auto-saved to the chat session with source="voice" markers.
Requirement: GEMINI_API_KEY configured on the platform.
Configuration
| Variable | Description |
|---|---|
| VOICE_ENABLED | Enable or disable voice chat |
| VOICE_MODEL | Gemini model to use for voice |
| VOICE_MAX_DURATION | Maximum voice session duration |
Voice API
| Endpoint | Method | Description |
|---|---|---|
| /api/agents/{name}/voice/start | POST | Start a voice session |
| /api/agents/{name}/voice/stop | POST | Stop a voice session |
| /api/agents/{name}/voice/status | GET | Get session status |
| /api/agents/{name}/voice/ws | WebSocket | Bidirectional audio bridge |
Image Generation
Platform image generation via a two-step Gemini pipeline: prompt refinement then image generation.
Submit an image generation request via API.
Prompt Refinement — Gemini refines the user's prompt using best-practice templates for the use case.
Image Generation — Gemini generates the image from the refined prompt. Returned as base64 or URL.
Used internally for agent avatars and other platform features. API: POST /api/image/generate
Agent Avatars
AI-generated avatars for agents using reference images, emotion variants, and default generation.
API: GET /api/agents/{name}/avatar (serve) and POST /api/agents/{name}/avatar (generate/upload).
Process Engine
BPMN-inspired workflow orchestration for multi-agent processes with approval gates, conditional branching, and analytics.
Concepts
agent_task, human_approval, gateway (conditional), timer, notification, sub_process.PENDING → RUNNING → COMPLETED / FAILED / CANCELLED, with PAUSED for approvals.Using the Process Engine
Process List (/processes) — Browse and create process definitions.
Process Wizard — Guided creation of process YAML.
Process Editor — Edit process definition YAML directly.
Execute — Publish a process, then start execution.
Monitor — Real-time WebSocket events for process progress.
Process Dashboard (/process-dashboard) — Analytics, metrics, cost tracking, trends.
Processes can call other processes (sub-processes). Parent-child linking is tracked with breadcrumbs in the UI. Bundled templates for common patterns are provided out of the box.
Process API
| Endpoint | Method | Description |
|---|---|---|
| /api/processes | GET/POST | List or create process definitions |
| /api/processes/{id} | GET/PUT/DELETE | CRUD operations |
| /api/processes/{id}/publish | POST | Publish a process definition |
| /api/processes/{id}/execute | POST | Start a new execution |
| /api/executions | GET | List all executions |
| /api/processes/{id}/analytics | GET | Process analytics and metrics |
Dynamic Dashboards
Agent-defined dashboards via dashboard.yaml with 11 widget types, historical tracking, and sparkline charts.
Widget Types
11 supported types: metric, status, progress, table, list, chart, text, badge, countdown, link, image.
How It Works
The agent writes a dashboard.yaml file to its workspace.
The file defines widgets with type, title, value, and optional configuration.
Open the agent detail page and select the Dashboard tab to see the widgets.
Auto-refresh updates values as the agent modifies the YAML file.
Historical values are tracked automatically — sparklines appear for metrics with enough data points. Trend indicators show up/down/stable arrows with percentage change.
A Platform Metrics section appears at the bottom of every dashboard, auto-injected with Tasks 24h, Success Rate, Cost, and Health. This section is not controlled by the YAML file.
Agents control their dashboard entirely by writing to dashboard.yaml. No API call is needed — the file is read on each dashboard request. API: GET /api/agents/{name}/dashboard.