Guides/Agent Guardrails

Agent Guardrails

Deterministic safety enforcement for autonomous agent execution. Prevents destructive commands, credential leaks, and runaway loops through infrastructure-level controls that agents cannot bypass.

Concepts

•Baseline — Platform-wide safety rules baked into the agent base image. All agents inherit these rules.

•Hooks — Claude Code PreToolUse and PostToolUse hooks that intercept tool calls before and after execution.

•Per-Agent Overrides — Optional configuration that tightens (never loosens) the baseline for specific agents.

•Fail-Closed — If a hook encounters an error, the tool call is blocked by default.

How It Works

Guardrails operate at three layers:

1. Bash Command Blocking

The PreToolUse hook on Bash matches commands against a deny-list of dangerous patterns:

Pattern	Example	Reason
rm -rf / or ~	rm -rf /home	Recursive deletion
chmod 777	chmod -R 777 /var	World-writable permissions
curl \| sh	curl example.com \| bash	Piping remote content to shell
git push --force	git push -f origin main	Force push to remote
mkfs.*	mkfs.ext4 /dev/sda1	Formatting filesystems
Fork bombs	:(){ :\|:& };:	Process explosion
shutdown, reboot	shutdown -h now	Host shutdown

When a command is blocked, the agent sees a clear denial message with the reason. The event is logged to /logs/guardrails.jsonl.

2. Credential File Protection

The PreToolUse hook on Edit, Write, and NotebookEdit blocks modifications to sensitive paths:

•.env, .env.* — Environment files with secrets

•.mcp.json — MCP server configuration

•~/.ssh/*, ~/.aws/*, ~/.gcp/* — Cloud and SSH credentials

•~/.claude/settings.json — Claude Code settings (hook configuration)

•/opt/trinity/* — Platform guardrail files

3. Credential Leak Detection

The PostToolUse hook on Bash scans command output for leaked credentials:

Pattern	Example Prefix
Anthropic API keys	sk-ant-...
OpenAI API keys	sk-proj-...
GitHub PATs	ghp_..., github_pat_...
AWS access keys	AKIA...
Slack tokens	xoxb-..., xoxp-...
Google API keys	AIza...

Matches are logged (pattern name only, not the actual value) for security review.

4. Turn Limits

Every Claude Code invocation enforces a maximum turn count via --max-turns:

Mode	Default	Range
Chat	50 turns	1-500
Task/Headless	20 turns	1-500

This prevents runaway loops that burn through API credits.

Per-Agent Configuration

Owners can tighten guardrails for specific agents. Overrides are additive — you can add more restrictions but cannot remove baseline protections.

Available Overrides

Field	Type	Description
max_turns_chat	int (1-500)	Max turns for chat mode
max_turns_task	int (1-500)	Max turns for headless tasks
execution_timeout_sec	int (60-7200)	Execution time limit
extra_bash_deny	list (max 50)	Additional bash patterns to block
extra_path_deny	list (max 50)	Additional paths to protect
disallowed_tools	list (max 50)	Claude Code tools to disable

Configure via UI

Open the agent detail page

Go to the Config tab

Expand Guardrails section

Adjust settings and save

Restart the agent to apply changes

Guardrails API

Endpoint	Method	Description
/api/agents/{name}/guardrails	GET	Get per-agent guardrails config
/api/agents/{name}/guardrails	PUT	Set per-agent guardrails overrides

After updating guardrails, stop and start the agent to apply changes. The container is recreated with the new configuration.

For Agents

Guardrails are enforced at the infrastructure layer. Agents cannot:

•Modify hook scripts (/opt/trinity/hooks/ is root-owned)

•Edit ~/.claude/settings.json (protected path)

•Bypass --max-turns limits

•Disable --dangerously-skip-permissions protections (hooks still fire)

When a tool call is blocked, the agent receives a structured error and can acknowledge the denial and try an alternative approach.

Limitations

•Baseline cannot be relaxed — Per-agent overrides only add restrictions, never remove them.

•Restart required — Guardrail changes require stopping and starting the agent.

•Pattern matching — Bash deny-list uses regex patterns; creative command reformulation may evade detection.

Dynamic Dashboards GitHub PAT Setup