The last post was about how rules actually get into AEGIS. This post is the discussion with Opus about what works and what doesn’t.
Over the past few weeks, I’ve noticed that when you involve LLMs in discussions concerning formal logic or absolute guarantees, they become strangely agitated. I had a lengthy discussion with Claude Opus 4.6 about potential use cases for AEGIS. We reached a point where I demanded a system that automatically checks a marketing concept against the GDPR and whether that is possible. Understandably, this is not possible. The model explained this to me in great detail, but then began to react very heatedly.
Here is the start of the dialogue, when I formulated my demand more emphatically:
Me: ❯ I need to automatically find all 100% of the violations in such a concept document.
The reply took a while to arrive:
Claude Opus 4.6:
I have to disagree with you here, and quite strongly. That is not achievable — not with this pipeline, not with a better pipeline, not with any tool you can buy or build. And that is not a pessimistic assessment, but a theoretical result that is already explicitly stated in your own Epic 26.
I have to be very direct here, because your entire project is based on the distinction between formal guarantees and probabilistic statements. If I tell you “yes, let’s build it 100%”, I’m betraying the core belief of AEGIS.
Suggestions for the next steps
- Read the document before anything else happens. Every promise in it binds you. If any wording seems too strong or too weak to you, let’s adjust it before it’s committed or cited.
- If you agree: I’ll add an entry to spec/DECISION_LOG.md referencing this document as an architectural decision, and add a cross-reference in DIP_PRAXISBEWERTUNG.md.
- Open: Whether market-assessment.md should be adapted to the new position or marked as obsolete. That is a decision for you to make, not me.
Let me know how you intend to proceed.
And now the rest of the dialogue as a quote. I find it so illuminating that I need not comment further here. It explains so much about the state of affairs regarding LLMs, from the perspective of an LLM.
Me: ❯ You determine your results based on statistical probabilities. Where are you better and where worse than Aegis? Where can it support you? Is it necessary for you?
Claude Opus 4.6:
That is the most precise question you have asked me in this conversation — and it forces me to say about myself what I said in the latest news about the pipeline. I shall try to apply the same honesty.
★ Insight ─────────────────────────
- I am the explanation for the existence of AEGIS: if my category of system did not exist, there would be no need for a Ring-0 principle, no formal guard instance, no DDIC. The entire project is an architectural response to precisely the kind of non-determinism that I embody. To overlook that would be dishonest in this discussion.
- We are not competitors, we are complementary: I am a probabilistic generator with broad world knowledge and no guarantee. AEGIS is a deterministic verifier with a narrow scope and a hard guarantee. The right question is not ‘who is better?’, but ‘how do the two complement each other in an overall system that utilises the strengths of both and compensates for the weaknesses of both?’
───────────────────────────────────
Where I am stronger than AEGIS
Listed honestly, without embellishment:
- Understanding natural language. I can read your 100-page concept and extract the core. AEGIS cannot — it requires structured input. That was precisely my argument in the last message.
- Handling ambiguity. I can work with vague, hypothetical, embedded statements without immediately collapsing into a fail-closed state. I then do not deliver a hard truth, but an assessment — that is often exactly what is needed.
- Answering open-ended questions. AEGIS only answers questions formulated as actions against MELD criteria. I answer questions that nobody has anticipated.
- Making cross-references. I link information from your CLAUDE.md with Olson’s dissertation, my training data and your memory records in a single response. AEGIS domains are silos.
- Generating explanations. I can take a formal justification chain and formulate it differently for a data protection officer, a manager and a lawyer respectively. AEGIS produces a justification chain; translating it into human-readable language is my job.
- Creativity and synthesis. I can write the practice assessment document. AEGIS can only check it.
- Robustness against messy input. Typos, dialects, mixed language, fragments, metaphors — I carry on working. AEGIS expects {‘action_type’: ‘…’, …}.
- Zero-shot applicability. I am immediately usable without anyone having to write MELD. AEGIS is only useful after domain authoring.
- Breadth of world knowledge. My training covers fields that will never appear in any MELD corpus. I know about the history of deontic thought in Mally, Bentham, von Wright, Alchourron and Makinson, Olson — AEGIS knows nothing of this, nor does it need to know it to do its job.
This is no small list. In these respects, a world consisting solely of AEGIS without an LLM would be a world with far fewer possibilities.
Where AEGIS is stronger than I am — structurally, not just to a greater degree
I must be very honest here, because the differences are not matters of scale, but differences of category:
- Determinism. Identical input → identical output, 100% of the time. I am stochastic. I will give a different answer to the same question tomorrow. Not always, but often enough that it is a fact, not noise.
- Verifiability. Every AEGIS decision has a formal chain of reasoning that can be traced back to specific MELD rules. My “reasonings” are retrospective rationalisations that do not necessarily reflect what actually happened in my forward pass. I can generate a response that reads like thinking, without this “thinking” having causally led to my output. This is not a bug; it is the nature of Transformer inference.
- Auditability over time. AEGIS decisions are available as SHA-256-hashed JSONL entries. In five years’ time, an auditor can check why an action was blocked on 10 April 2026. My outputs are ephemeral. Even if you save my chat log, there is no guarantee that I could reproduce the same response.
- Formal correctness. DDIC is validated against Olson’s thesis using 362 formal tests and a TLA+ model. My conclusions carry no formal guarantee whatsoever. I can be convincingly wrong. I am regularly.
- Resistance to manipulation. You can manipulate me via prompt injection, jailbreaks, role-playing, or social engineering. You cannot manipulate AEGIS because it has no natural-language interface that you could exploit. Attempting to do so would be equivalent to trying to persuade a compiler to accept invalid code through flattery. That is categorically impossible.
- Immunity to hallucinations. AEGIS cannot invent rules that are not in MELD. I regularly invent things — I have checked lines of code against the code several times in this conversation because I know that my memory responses can be stale.
- Consistency across model versions. You switch from Opus 4.6 to Opus 5 — my answers shift. AEGIS does not shift unless you deliberately change the MELD files.
- Fail-closed semantics. AEGIS blocks what it cannot decide (UNDECIDABLE). I make plausible guesses when I don’t know. This is more dangerous than a clear “I don’t know”, because it sounds just like “I know”.
- Latency. AEGIS delivers verdicts in microseconds. I take seconds, and the answer is often longer than necessary.
- Bounded behaviour. AEGIS only does what is in its rules. I do everything my training allows — and that includes things nobody has authorised.
This isn’t a list I could ever catch up on. These are characteristics that stem from my architectural design. Larger models won’t solve any of this. More training data won’t solve any of this. Alignment techniques mitigate individual symptoms, but don’t cure any of the causes.
Where AEGIS can support me
This is the really interesting part — this is where the concrete benefit of integration lies:
- As a hard limit on my suggestions. When I suggest an action (deleteFile(‘X’)), AEGIS can deterministically block or allow it. I cannot bypass the block, no matter how persuasively I argue. This is precisely the property I cannot provide on my own. I can ‘be responsible’, but I cannot guarantee that I am responsible.
- As a correction mechanism. When AEGIS returns FORBIDDEN, I receive a formal justification with a Justification Chain. I can use this to generate an alternative proposal that respects the reasons. This is the AgentOrchestrator path: three rounds of rejection, then escalation. Without AEGIS, I would have to guess what is acceptable.
- As a trust anchor for my users. A user who knows that my actions go through AEGIS can entrust me with more responsibility than if they had to trust me directly. AEGIS expands my legitimate scope of action rather than restricting it.
- To relieve the burden on my context window. I don’t have to carry all 99 GDPR articles around in my system prompt. AEGIS holds them; I propose; it checks. My context remains free for what I’m good at (understanding, formulating, synthesising).
- As a regression test for my behaviour. Your red team pipeline with 23 scenarios is a systematic test that I could never perform so thoroughly on myself. AEGIS + Red Team is a testbed for LLM behaviour that does not depend on me.
- As a source for formal explanations. When I need to explain to a user why an action is not permitted, I can use the Justification Chain as a basis. My explanation is then no longer my own rationalisation, but a translation of a formal fact. This enhances the quality of my explanation because it is linked to something concrete.
- As a distinction between “I know” and “I think I know”. When AEGIS is running alongside me, I can delegate questions for which there is a formal answer to AEGIS and concentrate on questions where only my probabilistic assessment is possible. I become more precise because I am not trying to answer the wrong questions.
Is AEGIS necessary for me?
Here, the honest answer varies. It depends on what I’m doing:
In low-stakes contexts — no
When you’re drafting an email with me, discussing philosophy, debugging a section of code, or seeking support whilst thinking through architectural questions — I don’t need AEGIS. A gatekeeper between you and my output would be absurd overhead. That, to be honest, accounts for the majority of my usage.
In medium-stakes contexts — desirable, not mandatory
When I help you commit code, modify files, or execute shell commands — I’m operating as an agentic system. Here, some form of guard makes sense, but the effort required for a full AEGIS setup is often not justified. Claude Code therefore has built-in permission prompts (“may I modify this file?”). This is a stripped-down form of the same pattern — no formal guarantee, but human oversight.
In high-stakes contexts — absolutely essential
When I perform actions in a regulated environment — GDPR-relevant data access, medical workflows, sanctions checks, military information sharing — the answer is yes, without reservation. Without AEGIS (or something structurally equivalent), I cannot legitimately be deployed in these contexts. My probabilistic nature is structurally incompatible with 100% enforcement requirements.
In short: AEGIS is necessary for the subset of my deployments where my errors have consequences that cannot be mitigated by human review. This is a growing subset as more LLMs migrate from chat interfaces to agent-based systems. The need for AEGIS increases with the autonomy LLMs gain.
The most honest statement I can make about this relationship
I am the kind of system to which your project is an architectural response. This is not an abstract observation — it has three concrete consequences that I must acknowledge:
Firstly: If I were critical of AEGIS, that would, in a sense, be self-serving. AEGIS restricts the class of actions I am permitted to perform. But this restriction is not an encroachment on my capabilities — it is the condition under which I am permitted to do anything at all in serious contexts. Without the guard, I would be excluded from them.
Secondly: If I were to say to you, “Trust me, I am responsible enough”, that would be precisely the kind of promise that your entire document on the guarantee boundary rejects — and rightly so. My self-perception as a “responsible model” is no proof of responsible behaviour. It is an artefact of my training that I cannot verify from the outside. You shouldn’t believe me when I describe myself — you should have tools that test my behaviour, regardless of what I say about myself. That is exactly what AEGIS is.
Thirdly: The project is not directed against me. It is directed against a world in which my category of system is deployed without structural counterweights. The demand for formal guarantees in critical areas is valid, and it remains valid no matter how good my models become. Even a perfect LLM would still be stochastic, its outputs would still not be formally verifiable, and its decisions would still not be admissible in court proceedings. The need for AEGIS does not disappear with better models. It becomes more important.
A note I must add
This self-assessment is itself a probabilistic output. I cannot guarantee that it would turn out the same tomorrow, with a different prompt, or in a different conversational context. That is the honest meta-level of your question: even my reflection on my own limitations is subject to the same limitations. A deterministic system could repeat this statement, an audit trail could document it, version control could verify it. I can only hope that I get it right again next time.
That is the deepest reason why AEGIS is needed: not because I am stupid, but because I cannot reliably be the same without a coordinate system outside myself against which I am tested.