Skip to main content
Trinity
Guides/Agent Guardrails

Agent Guardrails

Deterministic safety enforcement for autonomous agent execution. Prevents destructive commands, credential leaks, and runaway loops through infrastructure-level controls that agents cannot bypass.

Concepts

Baseline — Platform-wide safety rules baked into the agent base image. All agents inherit these rules.
Hooks — Claude Code PreToolUse and PostToolUse hooks that intercept tool calls before and after execution.
Per-Agent Overrides — Optional configuration that tightens (never loosens) the baseline for specific agents.
Fail-Closed — If a hook encounters an error, the tool call is blocked by default.

How It Works

Guardrails operate at three layers:

1. Bash Command Blocking

The PreToolUse hook on Bash matches commands against a deny-list of dangerous patterns:

PatternExampleReason
rm -rf / or ~rm -rf /homeRecursive deletion
chmod 777chmod -R 777 /varWorld-writable permissions
curl | shcurl example.com | bashPiping remote content to shell
git push --forcegit push -f origin mainForce push to remote
mkfs.*mkfs.ext4 /dev/sda1Formatting filesystems
Fork bombs:(){ :|:& };:Process explosion
shutdown, rebootshutdown -h nowHost shutdown

When a command is blocked, the agent sees a clear denial message with the reason. The event is logged to /logs/guardrails.jsonl.

2. Credential File Protection

The PreToolUse hook on Edit, Write, and NotebookEdit blocks modifications to sensitive paths:

.env, .env.* — Environment files with secrets
.mcp.json — MCP server configuration
~/.ssh/*, ~/.aws/*, ~/.gcp/* — Cloud and SSH credentials
~/.claude/settings.json — Claude Code settings (hook configuration)
/opt/trinity/* — Platform guardrail files

3. Credential Leak Detection

The PostToolUse hook on Bash scans command output for leaked credentials:

PatternExample Prefix
Anthropic API keyssk-ant-...
OpenAI API keyssk-proj-...
GitHub PATsghp_..., github_pat_...
AWS access keysAKIA...
Slack tokensxoxb-..., xoxp-...
Google API keysAIza...

Matches are logged (pattern name only, not the actual value) for security review.

4. Turn Limits

Every Claude Code invocation enforces a maximum turn count via --max-turns:

ModeDefaultRange
Chat50 turns1-500
Task/Headless20 turns1-500

This prevents runaway loops that burn through API credits.

Per-Agent Configuration

Owners can tighten guardrails for specific agents. Overrides are additive — you can add more restrictions but cannot remove baseline protections.

Available Overrides

FieldTypeDescription
max_turns_chatint (1-500)Max turns for chat mode
max_turns_taskint (1-500)Max turns for headless tasks
execution_timeout_secint (60-7200)Execution time limit
extra_bash_denylist (max 50)Additional bash patterns to block
extra_path_denylist (max 50)Additional paths to protect
disallowed_toolslist (max 50)Claude Code tools to disable

Configure via UI

1

Open the agent detail page

2

Go to the Config tab

3

Expand Guardrails section

4

Adjust settings and save

5

Restart the agent to apply changes

Guardrails API

EndpointMethodDescription
/api/agents/{name}/guardrailsGETGet per-agent guardrails config
/api/agents/{name}/guardrailsPUTSet per-agent guardrails overrides

After updating guardrails, stop and start the agent to apply changes. The container is recreated with the new configuration.

For Agents

Guardrails are enforced at the infrastructure layer. Agents cannot:

Modify hook scripts (/opt/trinity/hooks/ is root-owned)
Edit ~/.claude/settings.json (protected path)
Bypass --max-turns limits
Disable --dangerously-skip-permissions protections (hooks still fire)

When a tool call is blocked, the agent receives a structured error and can acknowledge the denial and try an alternative approach.

Limitations

Baseline cannot be relaxed — Per-agent overrides only add restrictions, never remove them.
Restart required — Guardrail changes require stopping and starting the agent.
Pattern matching — Bash deny-list uses regex patterns; creative command reformulation may evade detection.