EU AI Act high-risk enforcement
Annex III high-risk systems: continuous record-keeping (Art 12), human oversight (Art 14), accuracy and cybersecurity (Art 15), post-market monitoring (Art 72).
In force · 2027-08-02Automated MCP, A2A, and posture compliance scanning for AI runtimes. Argus runs continuously on Garrison — the hardened operator console it pairs with — emitting Ed25519-signed, SHA-384 hash-chained evidence that auditors verify offline a year later.
Three scanners run on a fixed cadence — MCP server surface, agent-to-agent flows, and Garrison's signed posture manifest. Findings are Ed25519-signed and SHA-384 hash-chained into a SQLite WAL store. Every period summary chains to the last. Auditors verify the chain offline with public keys alone — no Argus dependency at audit time.
The pattern is uniform across jurisdictions: regulators want a mathematically continuous record of what the AI system actually did, not a quarterly questionnaire about what it was supposed to do. Today's GRC platforms can't deliver that — by design.
Annex III high-risk systems: continuous record-keeping (Art 12), human oversight (Art 14), accuracy and cybersecurity (Art 15), post-market monitoring (Art 72).
In force · 2027-08-0272-hour cybersecurity event notification · Tier-3 autonomy controls on AI handling NPI · Class A Company supplemental requirements.
In force · AI guidance 2026Effective challenge for AI-driven decisions. Ongoing monitoring of AI model behaviour. Outcomes analysis as a continuous control.
Continuous · AI guidance 2026Fairness, Ethics, Accountability, Transparency principles applied to AI / ML in regulated financial services on a continuous basis.
In force · ongoingFederal AI procurement gating on FedRAMP-pattern continuous monitoring. DoD AI authorization requirements landing through the period.
Rolling · 2026–2027Drata, Vanta, OneTrust, ServiceNow GRC, Hyperproof, AuditBoard — all excellent at user access reviews, SOC 2 control attestations, cloud configuration audits, identity governance. None of them reach the AI runtime layer. That's the wedge.
Whether a tool's description was silently edited post-deployment. The "rug-pull" — a tool's surface promises one thing, the live string says another. mcp_description_integrity compares live SHA-384 against the attested manifest.
Whether inter-agent crossings carry valid attestation chains. Config-time policies don't capture what actually traversed. Argus reads Drawbridge's CrossingChain at v0.3; Chronicle event tail at v0.5+.
Whether the FIPS-validated crypto module declared at deployment is still the one in use. Whether key-rotation age is still inside the framework maximum. Whether scan cadence still meets the framework's gap ceiling.
Whether AI agents stay inside operator-set authority bounds. Tier-3 ceilings for FedRAMP High, CMMC L3, SR 11-7, EU AI Act high-risk. Verified continuously, not as a one-time attestation.
Whether the audit log timeline is mathematically continuous, not just present. The findings exist question is easy. The nothing has been deleted since question requires hash chaining and signature verification — not record-keeping.
Whether the evidence trail can be verified a year later by a third party who doesn't trust you. Auditors don't run your code. Verification must work offline with only published public keys, any FIPS EdDSA implementation, and tar+gzip.
Per ADR-0028, Argus has no Anthropic SDK in the workspace. Scanners are pure regex, SQL, and hashing. Entire categories of agentic risk — prompt injection, cascading hallucination — are eliminated by construction. #![forbid(unsafe_code)] at every crate root.
mcp_description_integrity — oracle@9000 · sha-384 match · pass
a2a_trust_score_floor — peer:meridian · 0.91 ≥ 0.90 · pass
posture_manifest_field — crypto.fips_module = aws-lc-rs:4796 · pass
New check types require coordinated changes across the SDK (core/compliance-rules), Citadel's emitter, and argus-rules's dispatcher. The dispatch surface is small on purpose — small surfaces stay falsifiable.
| # | RuleCheck variant | What it evaluates |
|---|---|---|
| 01 | McpDescriptionIntegrity | Live SHA-384 of a tool's description compared against the attested manifest entry. The rug-pull detector — fires the instant a tool description drifts from what was attested at deployment. |
| 02 | McpDescriptionPoisoningScan | Twenty-two patterns covering instruction overrides, hidden directives, exfiltration URLs, cross-server shadowing, self-promoting language. Each pattern carries its own severity tier. |
| 03 | GovernanceFieldPresence { required_fields } | Per-tool governance metadata check. Operators declare what each tool's metadata must carry; Argus verifies those fields are present and non-empty. |
| 04 | AutonomyCeiling { max_level } | Agent autonomy level ≤ ceiling. FedRAMP High and CMMC L3 enforce Tier-3 ceilings; EU AI Act high-risk and SR 11-7 enforce per-domain ceilings. |
| 05 | A2aCrossingPolicy { allowed_event_types, max_rate_per_minute, require_attestation } | Per-crossing policy: only declared event types, at-or-below the per-minute rate, with valid attestation chain. Boundary-protection findings fire on policy violation. |
| 06 | A2aTrustScoreFloor { min_score } | Peer trust score must remain ≥ floor. IL5 enforces 0.90; FedRAMP High 0.85; CMMC L3 0.85. Trust scores erode with policy violations and recover with observation. |
| 07 | PostureManifestField { field_path, expected_value } | Declared field equality. Operators declare what their crypto provider, FIPS module, hash algorithm, signature algorithm, KDF should be; Argus verifies the manifest matches. |
| 08 | KeyRotationAge { max_age_days } | Signing-key age check. IL5: 60 days. FedRAMP High: 90 days. FedRAMP Moderate: 365 days. SOC 2: per operator policy. |
| 09 | ChronicleChainContinuity { max_gap_seconds } | Period chain gap ceiling. The mathematical continuity check — no period gap larger than the framework's ceiling (IL5: 12h; FedRAMP High: 24h; commercial: 7d). |
An auditor's question is rarely "did you scan?" The question is whether the scan you ran on this date was complete, and whether anything has been deleted since. The hash chain is the answer. The two-tier canonicalization discipline is what makes verification reproducible.
Every finding emitted by Argus is signed with the deployment Ed25519 key and references the SHA-384 of the previous finding in its framework partition. Periods chain on top of findings the same way. Genesis is 96 zero hex characters.
Verification walks every entry in sequence: reconstruct the expected prev_hash from the previous entry's full signed bytes; compare against stored — mismatch surfaces as ChainBroken { sequence }. Then rebuild signable bytes (signature blanked symmetrically) and verify Ed25519 — failure surfaces as SignatureInvalid { sequence }. Chain-break and signature-tampering surface as distinct errors.
Two-tier canonicalization came out of a real bug: when an artifact's canonical bytes are computed before the signature is populated, and a derived hash field is computed from those bytes and stored on the artifact, the signable bytes must blank both the signature field and the derived hash field. Otherwise draft-time bytes differ from verify-time bytes and verification fails. Periods, fixtures, and bundles all apply this discipline prospectively.
The auditor needs three things: the operator's deployment public key, the vendor public key for the rules pack, and the Auditor Evidence Bundle. They verify the entire posture offline, a year later, with any FIPS EdDSA, any SHA-384, any tar+gzip extractor. No Argus dependency.
| Artifact class | Signer | Algorithm | Canonical bytes |
|---|---|---|---|
| Rules pack | Vendor (Argus release team) | Ed25519 / manifest JSON | JSON · manifest_signature blanked |
| Finding | Operator (deployment key) | Ed25519 / canonical CBOR | CBOR · signature blanked |
| Period summary | Operator | Ed25519 / canonical CBOR | CBOR · signature AND canonical_hash blanked |
| Auditor Evidence Bundle | Operator | Ed25519 / manifest JSON | JSON · signature AND bundle_canonical_hash_sha384 blanked |
| Accuracy measurement | Operator | Ed25519 / JSON | JSON · measurement_signature blanked |
| Gold-set fixture | Fixture release team (separate key) | Ed25519 / manifest JSON | JSON · manifest signature AND bundle hash blanked |
Per-framework scan cadence is set in the posture manifest. Tier-3 autonomy ceilings, 60-day key rotation, 30-second cadence — the DoD floor — is shipped to every customer, regardless of SKU. The Cadillac frame at compact-car price.
Argus must never wake an operator at 3am for a finding that turns out to be incorrect. The critical-FP gate is the operator-protection invariant. The methodology is mechanical: SME-authored oracle labels, signed fixture, reproducible measurement, signed accuracy report.
oracle.yaml entries. Per ADR-0028, oracle labels cannot be LLM-generated. Each label signed with the SME team's Ed25519 key.argus accuracy run --rules-pack <pack> --fixture <tar.gz>. Output: signed measurement JSON. CI gates release on --fail-on-fp-exceeds 0.05 and --fail-on-critical-fp-exceeds 0.The gold-set fixture targets ~400 entries across 3 substrates × 17 frameworks. MCP server fixtures (~200 clean + ~150 known-bad + ~50 boundary); A2A flow fixtures (~100 clean + ~80 known-bad + ~30 boundary); posture manifest fixtures (~50 compliant + ~80 non-compliant + ~20 hybrid-mode v1.1 PQ entries).
Per-framework subsets are calibrated for each framework's specific controls — not "one universal fleet measured against all 17." The v1.0 seed scaffold ships with 10 entries spanning 8 frameworks; the GA fixture requires SME-authored oracle labels.
The measurement is the formal definition: for each fixture entry × framework × rule, evaluate the rule against the entry's instantiated InMemoryFacts, then classify against the oracle — true positive, false positive, true negative, false negative. fp_rate = FP / (FP + TN). critical_fp_rate = critical_FP / (critical_FP + critical_TN).
No other compliance product in this category publishes a structurally falsifiable accuracy claim. The methodology is the moat.
The reporting engine is Argus's fourth major subsystem, not a feature. ADR-0019: 80% of compliance evidence value is delivered through reports humans read. All HTML reports are single-file — CSS, JS, images inline as data URIs. Auditors archive one .html. PDF lands at v1.1 via native Rust generation (printpdf / pdf-writer / lopdf) — no Chromium dependency.
Five-minute watermarked posture snapshot for prospect evaluation. "TRIAL — not for audit submission" overlay. Designed to fit inside an evaluation call and surface real findings before procurement.
Cross-framework posture grid for CISOs, CFOs, audit committees, boards. Period coverage, top findings, posture change log. Written for the boardroom, not the engineer.
Per-control verdict breakdown for one framework with anchor navigation. The artifact 3PAOs, DPOs, MRM officers actually read during authorization review.
Signed tarball: manifest + vendor pubkey + findings.json + periods.json + OSCAL Assessment Results + rendered reports. Auditor verifies offline against the framework-specific dossier.
Period-over-period change: new findings, resolved findings, verdict shifts, net posture movement. The diff view for compliance.
Chronicle attestation chain renderer with chain-status indicator. The artifact the GRC platform polls on its own cadence.
Single-finding forensic detail: detection card, evidence-chain timeline, remediation steps, related findings, HITL disposition log. Pulled during incident response.
A typical v1.0 deployment running 10 frameworks at 300-second cadence accumulates ~18 MB of findings per year and ~2 MB of periods. Seven-year retention default: ~140 MB total. Memory: 30–60 MB resident. CPU: single-digit percent during scan cycles, idle otherwise.
| Operation | p95 target | Measured (v0.1.0-dev) |
|---|---|---|
| Scan-cycle latency | ||
| ≤10 MCP servers | ≤ 5s | ~1s |
| ≤50 MCP servers (v1.0 GA acceptance · AC-012) | ≤ 30s | ~6s |
| ≤200 MCP servers | ≤ 2min | TBD |
| Report generation | ||
| Trial-Mode Scan (cold) | ≤ 5min | ~30s |
| Executive Briefing (AC-015) | ≤ 10s | ~2s |
| Per-Framework Detailed | ≤ 15s | ~3s |
| Auditor Evidence Bundle · 30-day period (AC-016) | ≤ 30s | ~8s |
| External /external/v1/* API latency | ||
| GET /health | ≤ 50ms | measured |
| GET /posture | ≤ 200ms | measured |
| GET /findings (paginated) | ≤ 400ms | measured |
| GET /reports/{id}/download (303 redirect) | ≤ 100ms | measured |
Argus passes Garrison's closed-allowlist policy end-to-end. No special case. No soft-fail. No warn-mode. No override. The deterministic Castellan envelope (zero LLM, empty prompts) makes Argus the easiest possible binary to admit.
Argus's MCP surface declared up front — tool catalog, knowledge bundles, Herald C2-protected fields.
specs/argus-mcp-server.yamlprovider: deterministic · empty prompts, tools, mcp_servers. ASI01 and ASI06 eliminated by construction.
specs/argus-agent.yamlOWASP-Agentic ASI01–ASI10 scorecard · SHIP verdict · 0 FAIL · 2 WARN · 0 critical.
specs/aegis-self-audit.mdKeyless OIDC release signature + per-customer KeyServer entitlement. No license · no boot.
cosign verify · KeyServer tokenGarrison is the hardened operator console Argus runs inside — closed allowlist, single-operator authority, sandboxed execution, KeyServer-gated. Sister workloads include Oracle (signed domain expertise) and Meridian (governed time intelligence). Argus is the fourth.
If you already operate a different AI runtime, Argus can still observe it — provided your runtime emits an attested posture manifest and reachable MCP endpoints. The evidence chain works. The closed-allowlist invariant does not transfer.
Same Argus binary in every Garrison SKU. Same crypto chokepoint. Same supply-chain attestation. The IL5 deployment runs the same engineering artifact as the SOC 2 deployment.
The closest analog is the original GRC tools — Drata, Vanta, OneTrust — applied to a runtime layer they don't reach. Argus composes with them via the read-only /external/v1/* API. The pitch is additive, not replacement. Per ADR-0030, the platform-neutral positioning is deliberate.
| Adjacent player | Category | Overlap with Argus |
|---|---|---|
| Drata · Vanta · OneTrust | GRC platforms | Argus integrates via /external/v1/* — doesn't replace |
| Hyperproof · AuditBoard · ServiceNow GRC | Enterprise audit automation | Same — Argus feeds AI-runtime evidence in |
| Lakera · Robust Intelligence · CalypsoAI | AI security / red-teaming | Different problem. Argus measures continuous compliance; they measure model attack surface |
| CredoAI · Holistic AI · Fairly AI | AI model governance / fairness | Different problem. Argus is runtime evidence; they are model-level governance |
| Anthropic Trust Center · OpenAI Compliance | Foundation model vendor compliance | Different problem. Vendor-side compliance attestations |
| OWASP-LLM project · MITRE ATLAS | Open frameworks | Argus checks against frameworks; these define them |
Pricing levers under consideration: per-framework licensing vs all-inclusive · per-deployment vs per-component metering · annual vs multi-year discount structure · GSA / SEWP federal price-list eligibility · reseller and system-integrator margins.
| Tier | Customer profile | Annual range |
|---|---|---|
| Trial | Prospects evaluating Argus · watermarked Trial-Mode reports only · no license required | $0 |
| Commercial Starter | Single SOC 2 / HIPAA / GDPR deployment | $25K – $50K |
| Commercial Standard | Up to 5 commercial frameworks · standard support | $75K – $150K |
| Enterprise | Up to 17 frameworks · priority support · design-partner program | $200K – $400K |
| Federal / FedRAMP | FedRAMP Low / Moderate / High deployment | $300K – $600K |
| DoD | IL4 / IL5 deployment · Sovereign SKU · air-gap | $500K – $1M+ |
Reach out via the channels in SECURITY.md for design-partner pricing.
Regulators codifying continuous AI evidence requirements. AI tool surface area exploding through MCP servers, A2A flows, autonomous agents. Enterprise procurement requiring AI compliance evidence to close deals — especially in financial services, healthcare, federal, defense.