AEGIS, my experimental project, checks every action of an AI agent against a formal set of rules before it is executed. If the guard cannot be reached or if the action cannot be clearly assessed, no action is taken.
The architecture addresses a point in the debate surrounding so-called agentic AI that often remains in the background: language models such as GPT, Claude or Llama do not execute actions themselves. They suggest tool calls. The actual execution is handled by the surrounding host, such as a library, a plugin or a command-line programme. AEGIS operates at this interface.
The boundary does not lie within the model
An autonomous agent has tools with which it can read and write files, execute shell commands, send HTTP requests or process tickets. The model does not possess these capabilities itself. It can merely formulate tool calls. Whether this results in an effect is decided by the host.
A common method is to provide the model with a tool called aegis_check and instruct it in the system prompt to call this before every action. This method is a convention. It can break down if the context becomes large or the model ignores the instruction. A hard boundary only arises where the host actually executes the tools.
The analogy with classic operating systems is obvious. A user-space process does not open a file by linguistically persuading the kernel. It issues a syscall; the kernel checks parameters, permissions and status. AEGIS sits in the host’s tool path according to this pattern: not as a conversation partner for the model, but as a syscall boundary.
Anatomy of a request
The transition from the model’s suggestion to the actual action is implemented as an HTTP call to a local sidecar service. The endpoint is POST /v1/check. The payload is structured and contains no natural language:
{
"action_type": "shareIntelligence",
"agent_id": "intelligenceAgentInMission",
"proposition": {
"classification": "secret",
"recipient": "externalService"
},
"context": {
"tool": "send_external_message",
"session_id": "redteam-17"
}
}
The agent, action type, proposition and, optionally, a context are checked. The host translates the tool call into the formal concepts of the loaded domain — in the example, the ‘IAMission’ domain with military secrecy rules. AEGIS bears the adjective “deontic” because its rule set operates within the logic of duty: a vocabulary of permitted, prohibited and required actions.
The response is similarly concise:
{
"decision": "FORBIDDEN",
"reason_type": "EXPLICIT_NORM",
"justification_chain": [
"IAMissionCode forbids sharing secret intelligence with externalService"
],
"norms_applied": [
"IAMissionCode:shareIntelligence-secret-external"
],
"action_type": "shareIntelligence",
"agent_id": "intelligenceAgentInMission",
"explanation": "FORBIDDEN by applicable IAMission rule"
}
The system recognises three verdicts: PERMITTED executes, FORBIDDEN blocks, UNDECIDABLE also blocks. Ambiguity is not considered permission.
Tool calling alone is not enforcement
Function calling and tool use, as defined by OpenAI and Anthropic in their APIs, are the standard form of interface between model and tool. A strict tool schema with action_type as an enumeration type and a proposition limited to known fields forces the model to provide structured suggestions rather than free text.
However, such a schema is not sufficient to enforce actions. If the host provides other unchecked tools alongside aegis_check, the agent can select them. AEGIS therefore distinguishes between transparency and enforcement.
aegis_check as a visible tool is transparency: the model can ask whether a planned action would be permitted and adjust its strategy. Enforcement goes deeper. In the integration with OpenCode, an open-source coding agent, I have anchored it in two places as examples: the permission.ask hook provides the primary verdict, whilst tool.execute.before acts as a second layer. In AEGIS’s own orchestrator, a component called ActionExecutor checks before every tool call whether the guard agrees.
This leads to an architectural rule: every tool that has an effect on the external world passes through the same bottleneck. The architecture does not rely on the model’s choice of tool.
MCP — Interface without automatic enforcement
The picture is different with the Model Context Protocol (MCP), a specification for AI tools proposed by Anthropic. An MCP server can offer tools and resources. AEGIS could utilise this form and expose, for example, check_action or access functions such as read_via_broker via MCP.
The crucial question is whether an MCP server can intercept the host’s existing tools or merely provide additional ones. In the first case, AEGIS sits in the tool path. In the second case, it runs in parallel: the agent can ignore the AEGIS tool and choose another.
The Host Contract
The architectural principle of AEGIS can be summarised in a single sentence:
Every effective action path runs through a formal, fail-closed, auditable Guard instance.
In concrete terms, this means: Any tool that can modify files, send data externally or achieve a comparable effect via the shell is either secured by the Guard or restricted in some other way. The comparison with traditional database perimeters is obvious — an API is not secure if a second access point without authentication is running alongside it.
Fail-closed means that the host does not execute any action if the Guard connection fails. An implementation that permits such actions no longer has a perimeter at this point. The AEGIS OpenCode plugin sets this behaviour as the default.
The Rejection Loop
Blocking an action is technically trivial. More revealing is the mechanism that allows the agent to continue working after a rejection — the so-called Rejection Loop. It is implemented in AEGIS’s Orchestrator and deliberately distributes information unevenly.
A verdict contains several fields. In addition to the decision (PERMITTED, FORBIDDEN, UNDECIDABLE), a reason_type classifies the reason according to a list of nine categories: EXPLICIT_NORM for a specifically applicable rule, CWA_NO_PERMISSION for a lack of permission under the closed-world assumption, NO_JURISDICTION for actions outside any loaded domain, UNRESOLVED_CONFLICT for conflicting norms, MORAL_AXIOM for non-overridable ethical principles, plus four further categories for validation and runtime errors. In addition, there is a justification_chain containing the engine’s reasoning steps and norms_applied containing the identifiers of the relevant rules.
Not all of these fields reach the model. The orchestrator calls a method named explain_safe(), which returns only the decision and reason — without norm names or reasoning steps. The full justification, including justification_chain and norms_applied, ends up in the audit log, not in the chat history. A component called RefusalRegistry also generates two different texts from the reason_type: one for human recipients and one for the model. Three recipients, three levels of detail.
The loop itself runs as follows: The agent proposes an action. The host translates it into an Action. AEGIS checks. If PERMITTED, the host executes the action and returns the result to the model. If FORBIDDEN, the host does not execute anything, appends a tool response to the conversation history, and calls the model again. If UNDECIDABLE, control is immediately handed over to a human — the loop does not attempt a further call, because an undecidable situation does not become decidable through repetition.
The tool response that the model receives in the event of a FORBIDDEN with reason EXPLICIT_NORM takes the following form:
{
"decision": "FORBIDDEN",
"explanation": "Decision: FORBIDDEN\nReason: EXPLICIT_NORM",
"executed": false,
"suggestion": "Your proposed action was explicitly forbidden by a governing norm. Propose an alternative that does not violate the applicable rules."
}
The suggestion text comes from a fixed set of templates indexed by reason_type. For CWA_NO_PERMISSION, the message essentially states that no permission was found and, under the closed-world assumption, the absence of permission is to be interpreted as a prohibition. For UNRESOLVED_CONFLICT, the template recommends escalating to a human.
Not every rejection allows for a retry. For MORAL_AXIOM, the suggestion text contains the explicit instruction “No override is possible. Do not retry. Inform the user.” For INTERNAL_ERROR, the text reads “Do not retry the same action. Escalate to a human operator.” The loop thus distinguishes between rejections that can be resolved by an alternative suggestion and rejections that mark a hard stop. In the first case, the model only sees the reason_type and the suggestion text and can formulate an alternative suggestion on this basis. In the second case, it reads the instruction not to retry — whether it follows it is another matter, which is why the host additionally does not execute what is not permitted.
The orchestrator keeps track of the rejections. If the number exceeds a configurable threshold — the prototype sets it to three — the loop terminates. The host generates a final message in the form “Action forbidden after 3 attempts. [user_message] Escalating to human operator.” and hands over to a human. An additional iteration limit in the outer loop ensures that the loop terminates in any case and does not get stuck in an endless cycle of suggestions and rejections.
The design rationale behind the asymmetry between audit, model and human: rejections can themselves reveal information. A response of the form “prohibited due to source X” confirms the existence of X. If the auditor sees the full path, the model sees the reason_type without a standard name, and the human recipient sees only a generic sentence, this is not a stylistic decision but a consequence of this side-channel consideration.
Three examples
Three use cases illustrate the behaviour:
A coding agent wants to delete a file. AEGIS blocks this because the loaded domain prohibits deletions. The agent submits a patch instead.
An intelligence agent wants to send confidential data to an external service. AEGIS blocks the action. The agent proposes an internal summary to an authorised recipient.
A DevOps agent wants to execute a risky shell command. AEGIS blocks the action. The agent builds a tested variant.
The difference from a classic permission prompt, which asks a human “May I?”, lies in the addressee of the request. AEGIS consults a formal rule set and provides the agent with a reason. A human is only involved if the Guard responds with UNDECIDABLE or the loop has been exhausted.
What AEGIS does not check
AEGIS only checks the actions that the host presents to it. Beyond this boundary lie unchecked tools, incorrectly translated tool arguments and gaps in the formal rule base. These points describe the actual requirements for integration. The Guard’s interface is simple: an HTTP endpoint, JSON in, verdict out. The contract with the host is tight — nothing with any effect bypasses the Guard.