Guides/Advanced Features

Advanced Features

Voice chat, image generation, agent avatars, BPMN-inspired process engine workflows, and agent-defined dynamic dashboards.

Voice Chat

Real-time voice conversations with agents via Gemini 2.5 Flash Native Audio model (~280ms latency). Audio streams bidirectionally through a backend WebSocket proxy.

Open an agent's Chat tab.

Click the microphone button.

A voice overlay appears with status, mute, and end controls.

Speak — audio is captured as PCM 16kHz and streamed to the backend WebSocket.

The backend proxies audio to the Google Gemini Live API.

Agent response audio (PCM 24kHz) plays back in real-time.

Transcripts are auto-saved to the chat session with source="voice" markers.

Requirement: GEMINI_API_KEY configured on the platform.

Configuration

Variable	Description
VOICE_ENABLED	Enable or disable voice chat
VOICE_MODEL	Gemini model to use for voice
VOICE_MAX_DURATION	Maximum voice session duration

Voice API

Endpoint	Method	Description
/api/agents/{name}/voice/start	POST	Start a voice session
/api/agents/{name}/voice/stop	POST	Stop a voice session
/api/agents/{name}/voice/status	GET	Get session status
/api/agents/{name}/voice/ws	WebSocket	Bidirectional audio bridge

Image Generation

Platform image generation via a two-step Gemini pipeline: prompt refinement then image generation.

Submit an image generation request via API.

Prompt Refinement — Gemini refines the user's prompt using best-practice templates for the use case.

Image Generation — Gemini generates the image from the refined prompt. Returned as base64 or URL.

Used internally for agent avatars and other platform features. API: POST /api/image/generate

Agent Avatars

AI-generated avatars for agents using reference images, emotion variants, and default generation.

•Reference Image — Upload a reference image and the avatar is generated in that style.

•Variation Regeneration — Generate new variations from an existing avatar.

•Emotion Variants — The Agent Detail page cycles through emotion-based avatar variants every 30 seconds.

•Default Avatar Generation — Admin button in Settings generates robot/android-style avatars for all agents without a custom avatar.

•WebP Conversion — Avatars are converted to WebP via Pillow for optimization.

API: GET /api/agents/{name}/avatar (serve) and POST /api/agents/{name}/avatar (generate/upload).

Process Engine

BPMN-inspired workflow orchestration for multi-agent processes with approval gates, conditional branching, and analytics.

Concepts

•Process Definition — A YAML file defining steps, agents, and flow.

•Step Types — agent_task, human_approval, gateway (conditional), timer, notification, sub_process.

•EMI Roles — Executor (performs work), Monitor (can intervene), Informed (notified).

•Execution State Machine — PENDING → RUNNING → COMPLETED / FAILED / CANCELLED, with PAUSED for approvals.

Using the Process Engine

Process List (/processes) — Browse and create process definitions.

Process Wizard — Guided creation of process YAML.

Process Editor — Edit process definition YAML directly.

Execute — Publish a process, then start execution.

Monitor — Real-time WebSocket events for process progress.

Process Dashboard (/process-dashboard) — Analytics, metrics, cost tracking, trends.

Processes can call other processes (sub-processes). Parent-child linking is tracked with breadcrumbs in the UI. Bundled templates for common patterns are provided out of the box.

Process API

Endpoint	Method	Description
/api/processes	GET/POST	List or create process definitions
/api/processes/{id}	GET/PUT/DELETE	CRUD operations
/api/processes/{id}/publish	POST	Publish a process definition
/api/processes/{id}/execute	POST	Start a new execution
/api/executions	GET	List all executions
/api/processes/{id}/analytics	GET	Process analytics and metrics

Dynamic Dashboards

Agent-defined dashboards via dashboard.yaml with 11 widget types, historical tracking, and sparkline charts.

Widget Types

11 supported types: metric, status, progress, table, list, chart, text, badge, countdown, link, image.

How It Works

The agent writes a dashboard.yaml file to its workspace.

The file defines widgets with type, title, value, and optional configuration.

Open the agent detail page and select the Dashboard tab to see the widgets.

Auto-refresh updates values as the agent modifies the YAML file.

Historical values are tracked automatically — sparklines appear for metrics with enough data points. Trend indicators show up/down/stable arrows with percentage change.

A Platform Metrics section appears at the bottom of every dashboard, auto-injected with Tasks 24h, Success Rate, Cost, and Health. This section is not controlled by the YAML file.

Agents control their dashboard entirely by writing to dashboard.yaml. No API call is needed — the file is read on each dashboard request. API: GET /api/agents/{name}/dashboard.

Mobile Admin Voice Chat