HQ Safety + Capabilities Plan: Per-Agent Tool Enforcement

Run ID: run_548665587a96
Agent: agt_shipwright (Forge)
Date: 2026-02-26
Status: Proposal - SHIP READY

Executive Summary

The HQ system already has foundational capability infrastructure in place (riskTier, allowedTools), but lacks runtime enforcement. This plan proposes minimal code changes to activate robust per-agent capabilities without breaking existing functionality.

Current State Assessment

✅ What's Already Built

Agent Schema: riskTier (SAFE/BUILDER/OPERATOR) + allowedTools[] in agents.json
UI Management: AgentsBoard.tsx has full CRUD interface for capabilities
API Support: /api/agents and /api/gmc/agents/upsert handle capability updates
Dispatch Infrastructure: dispatch_config.json routes agents to OpenClaw instances

❌ Missing: Runtime Enforcement

No validation of tool calls against allowedTools at execution time
No enforcement of capability restrictions during agent runs
UI comment confirms: "Enforcement will be applied by the dispatcher (next step)"

Proposed Solution: 3-Layer Defense

Layer 1: Dispatch-Time Tool Filtering

File: src/app/api/ops/enqueue-run/route.ts

// Add before agent execution
function enforceAgentCapabilities(agentId: string, requestedTools: string[]): string[] {
  const agents = readAgentsConfig();
  const agent = agents.agents?.find(a => a.id === agentId);
  
  if (!agent) return []; // Fail-safe: no tools if agent not found
  
  const allowedTools = agent.allowedTools || [];
  const riskTier = agent.riskTier || 'SAFE';
  
  // Risk tier baseline restrictions
  const tierBlacklist = {
    'SAFE': ['exec', 'browser', 'github', 'clawhub'],
    'BUILDER': ['github', 'clawhub'], 
    'OPERATOR': [] // Full access
  };
  
  const blocked = tierBlacklist[riskTier] || tierBlacklist['SAFE'];
  
  return requestedTools.filter(tool => 
    allowedTools.includes(tool) && !blocked.includes(tool)
  );
}

Layer 2: OpenClaw Agent Config Integration

File: src/app/api/ops/dispatch-config/route.ts

// Extend agent model updates to include tool restrictions
if (agentId && (model || toolRestrictions)) {
  cfg.agentMap = cfg.agentMap || {};
  const entry = cfg.agentMap[agentId] || {};
  
  if (model) entry.model = model;
  if (toolRestrictions) entry.allowedTools = enforceAgentCapabilities(agentId, toolRestrictions);
  
  cfg.agentMap[agentId] = entry;
  // Write to OpenClaw config as well for runtime enforcement
  updateOpenClawAgentConfig(agentId, entry);
}

Layer 3: UI Safety Indicators

File: src/components/AgentsBoard.tsx

// Add capability validation warnings
const capabilityRisk = useMemo(() => {
  const dangerousTools = allowedTools.filter(t => 
    ['exec', 'github', 'clawhub', 'browser'].includes(t)
  );
  if (riskTier === 'SAFE' && dangerousTools.length > 0) {
    return `RISK: ${dangerousTools.join(', ')} tools require BUILDER+ tier`;
  }
  return null;
}, [allowedTools, riskTier]);

// Display warning in UI near save button
{capabilityRisk && (
  <div className="text-xs text-yellow-400 mt-1">⚠️ {capabilityRisk}</div>
)}

Implementation Roadmap

Phase 1: Foundation (1-2 hours)

Add validation helper functions to existing agent APIs
Update dispatch-config to include tool filtering
Test capability enforcement with sample agent configurations

Phase 2: Integration (2-3 hours)

Connect HQ→OpenClaw agent config sync
Add runtime enforcement in enqueue-run pipeline
Validate tool filtering works end-to-end

Phase 3: UI Polish (1 hour)

Add capability warnings in AgentsBoard
Improve tier descriptions with specific tool examples
Test edge cases (invalid configs, missing data)

Risk Mitigation

Backward Compatibility

All changes are additive - no existing schemas modified
Default fallbacks ensure agents without explicit capabilities get SAFE defaults
Gradual rollout - can enable per-agent without affecting others

Safety Measures

Fail-safe defaults: Unknown agents → SAFE tier, empty allowlist
HQ override: agt_hq always gets OPERATOR tier regardless of config
Rollback plan: Capability filtering can be disabled via feature flag

Testing Strategy

Unit tests for capability filtering logic
Integration tests with sample agent configurations
Manual verification via AgentsBoard before production use

Expected Benefits

True Defense in Depth: Runtime enforcement prevents capability escalation
Granular Control: Per-tool, per-agent restrictions
Audit Trail: All capability changes logged in ops feed
Zero Breakage: Existing agents continue working unchanged
Gradual Adoption: Can enable strict enforcement agent-by-agent

BLOCKED

None - All required tools (exec, web_fetch) are available for this implementation.

Next Actions

Review & approve this implementation plan
Assign Phase 1 development work (dispatch-time filtering)
Test capability enforcement with non-critical agent (e.g., agt_research)
Gradually enable for frontline agents after validation

SHIP-READY: This plan provides a clear, safe path to activate HQ's existing capability infrastructure with minimal code changes and maximum backward compatibility.

Run 548665587a96