Vertical AI Foundation: Enterprise Architecture for the AI Era

Or: How to Prevent the AI Mainframe Before It's Too Late

In [The AI Mainframe Problem](ai-mainframe-trap), we diagnosed the crisis: enterprises are building AI mainframes without realizing it. No causal lineage, no governance, bolt-on solutions breaking at scale.

As we were finalizing this architecture, AWS confirmed the kind of failure we predicted: their Kiro AI agent took down production services in December 2025. Given overly broad permissions, it decided the optimal solution was to "delete and recreate the environment." Production service outage. No Airlocks to catch it. No isolation to prevent it. No mandatory human approval. AWS called it "user error." The protection should have been architectural. [The Decoder](https://the-decoder.com/aws-ai-coding-tool-decided-to-delete-and-recreate-a-customer-facing-system-causing-13-hour-outage-report-says/#ai-tools-had-operator-level-permissions-with-no-peer-review)

This is the AI Mainframe emerging in real time.

And regulators aren't waiting. In 2026 they are making it clear: "Innovation no longer shields organizations from responsibility." You're now liable for what your AI does. [Pirani](https://www.piranirisk.com/blog/ai-risk-in-2026-when-innovation-stops-being-a-valid-excuse)

The way out isn't better prompts. It isn't switching to the latest model. It's architecture.

Enterprises escaped the mainframe era by introducing explicit boundaries, deterministic workflows, governed mutation, and causal traceability. Vertical AI Foundation applies those same patterns to AI systems, adapted for stochastic components and modern scale.

These aren't new inventions. They're 20-year-old systems engineering patterns, adapted for the AI era.

Test This Architecture First

Before you read 6,000 words, I want you to test the claims with AI.

Copy this prompt into Claude, ChatGPT, Gemini Pro, Grok, etc:

Share what your AI said in the comments on the LinkedIn post. [LinkedIn](https://www.linkedin.com/posts/emabdullah_vertical-ai-enterprise-architecture-for-activity-7431699551775858689-MVCK?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAQQzWkB1EQ6Wx7mynXzTymQj91Omoteu7s)

I think the architecture is valuable, refactorable, and rigorous.

> But I also want your feedback. Not just what the AI thinks. What do you think after reading the full breakdown?

Want the full technical breakdown? Keep reading.

1. Steps and States: The Foundation

The problem: A regulator asks 'Why did your AI deny this procedure?' You have the prompt and output, but can't trace the reasoning. No lineage = no compliance = no defense.

The solution: Make every AI interaction deterministic and traceable.

Every AI interaction becomes two things:

Steps: Atomic units of cognition. Compile context, select model, execute, validate. Each step is a discrete, replayable action.

States: Validated outputs you can point to and say "yes, that happened." Each state is immutable, timestamped, and causally linked to the step that produced it.

Example flow:

What you get: - Every step is replayable. You can re-execute the exact sequence - Every state is immutable. Can't be changed after creation - Full trace from question to answer. Complete audit trail - Can replay to see exactly what the system knew when. Temporal consistency

Context Requirements:

Each step declares what context it needs to execute — this is context engineering at the architectural level.

Context engineering encompasses both what information the model receives (data, tools, memory) and how that information is structured and presented (including prompt engineering). CARL learns across all dimensions: which data sources improve outcomes, which prompt structures work best, and how to balance comprehensiveness against token efficiency.

Required Inputs (always included): - Deterministic, always fetched - Example: Patient ID, claim details, policy version

Dynamic Inputs (conditionally included): - Declared as possibilities, evaluated by Control Plane - Example: Similar past claims (embedding search, top 5 results) - Example: Patient history (conditional: if comprehensive review) - Example: Treatment guidelines (optional: if policy requires)

The Control Plane executes this specification, while CARL refines which dynamic inputs actually improve outcomes and how to present them optimally.

Control Plane executes the spec: 1. Fetch all required inputs (deterministic) 2. Evaluate each dynamic input (run query, apply condition, rank results) 3. Assemble final context 4. Create execution capsule with bounded context

CARL (Frontier) learns which dynamic inputs improve outcomes per-step, refining inclusion patterns over time.

Why this matters:

When the regulator asks "Why was this claim denied?", you provide: - The exact context the system had (State 1) - Which policy version was active (State 2) - The complete reasoning chain (State 3) - Proof that governance was enforced (State 4) - An immutable audit trail (State 5)

> This isn't just logs. This is deterministic governance around stochastic AI systems.

Technical foundation: Event Sourcing (Martin Fowler, 2005) + CQRS (Command Query Responsibility Segregation) + Kafka guaranteed ordering. These patterns have been proven at scale for 20 years. We're applying them to AI reasoning.

2. Causal Lineage: The Memory

The problem: Your AI-powered customer service looked successful for 18 months before you discovered it was destroying satisfaction. The metrics said "success" while customers got worse service. No way to trace AI decisions to business outcomes.

The solution: Connect every decision to its consequences.

Steps and States give you structure. Causal Lineage gives you memory. Not just what happened, but why, and what it caused.

What gets captured: - Which strategy CARL (Causal Refinement Learning) selected, and the reasoning - What context was active. Full snapshot, versioned - What inputs were consumed, with provenance - What states were produced, linked causally - What validations passed or failed, with criteria - What the system believed at each step. Epistemic state

> This isn't logs. This is causation: > - These control decisions → produced these inputs > - Which produced these outputs > - Which produced these outcomes > - With full lineage connecting each step

Example:

The lineage doesn't just show what happened. It shows the causal chain: 1. Decision: Use GPT-4 to save $0.23 per article 2. Consequence: Tone became too formal (validation caught it, warning ignored) 3. Result: Readers bounced in 32 seconds instead of staying 3 minutes 4. Impact: -92% share rate, -83% return visitors

This solves the European fintech disaster: They were measuring cost per interaction (visible metric) while customer satisfaction degraded (invisible outcome). Causal Lineage would have connected AI responses → satisfaction → retention within weeks, not 18 months.

For distributed systems:

Real enterprises aren't single monoliths. Your AI systems span microservices: - Research service produces evidence - Context service assembles background - Reasoning service generates response - Validation service checks output - Execution service takes action

Causal Lineage works across boundaries: - Each service maintains its own causal graph - Cross-service calls create explicit lineage edges - Distributed traces link causally (not just temporally) - Full observability across domain boundaries

Causal Lineage is infrastructure, not analysis

Causal Lineage doesn't prove causation. It enables causation analysis.

Think of it like data warehousing: - Snowflake doesn't do business intelligence - It stores data so BI tools CAN analyze it - Without the warehouse, there's nothing to analyze

Same for Causal Lineage: - Doesn't replace A/B testing or statistical inference - Captures decision context so you CAN analyze outcomes - Without lineage, you can't prove what the AI knew when

What you can do with lineage: - Regulatory compliance: "Here's exactly what the AI knew when it decided" - Outcome analysis: "Connect this decision → that result" (requires experimentation framework) - CARL training: "Learn from what actually worked" (requires proper causal inference) - Litigation defense: "Immutable proof of what data was/wasn't used"

What you can't do without lineage: - Any of the above

Note on CARL: Causal Refinement Learning (adaptive intelligence that learns from lineage) is coming in the Frontier series. You don't need CARL to benefit from lineage today. Compliance, cost analysis, and quality debugging work immediately. CARL makes the learning automatic. Build the foundation now, add adaptive learning later.

Capture now what you'll need later.

When the regulator asks in 2027 why your AI made a decision in 2026, you can't time-travel to reconstruct the context. When you want to prove causation for CARL learning, you can't retroactively add lineage to past decisions.

This is the same argument data warehousing made in the 1990s. It was right then. It's right now.

Storage reality: Yes, immutable lineage requires infrastructure investment, comparable to what enterprises already pay for Snowflake, Databricks, or analytics platforms. Modern object storage (S3, Azure Blob) costs around $0.02/GB/month for raw storage. Add indexing, query infrastructure, and governance tooling, and total cost of ownership is 3-5x higher. Still comparable to data warehouse investments most enterprises already make. The difference: your AI decisions become as queryable and governable as your business metrics. For most enterprises, full AI lineage costs less than a single compliance failure.

Technical foundation: Event sourcing + distributed event routing (Kafka, EventStore, proven at Netflix/LinkedIn scale for data pipelines). We're extending it to capture AI reasoning, not just data flow.

3. Control Plane / Execution Plane: The Isolation

The problem: Security audit asks "Can your AI access customer PII?" You don't know. It's scattered across prompt templates, RAG pipelines, conversation logs. Even if the answer is no, you can't prove it. You cannot govern what you cannot isolate.

The solution: The architecture separates authority from execution through two distinct planes:

Control Plane (trusted authority): - Holds governance policies and access control - Stores protected data (PII, credentials, sensitive context) - Enforces safety boundaries and compliance rules - Compiles context with least-privilege principles - Records audit lineage for every action - Makes all strategic decisions (which model, which tools, which data)

Execution Plane (untrusted sandbox): - Receives only the data needed for this specific task - No direct access to protected resources - No persistent state between executions - Every action logged and lineage-tracked - Deterministic execution under governance - Cannot escalate privileges or access additional data

How it works:

What this prevents:

When the security auditor asks "Can your AI access vendor proprietary data?", you answer with certainty:

"Yes, in the Control Plane under strict access control. No, in the Execution Plane. The model never sees sensitive vendor data directly. Here's the lineage proof:"

This solves the crises: - PII/sensitive data audit: Prove exactly what data each execution could access - Vendor breach: Even if your AI vendor is compromised (like the McDonald's case where a vendor exposed 64M applicant records), the blast radius is limited to the execution capsule's scoped data, not your entire vendor database or proprietary contract terms - Compliance: Clear governance boundaries satisfy procurement and security auditors - Security: AI can't touch sensitive data it doesn't need

For distributed systems: - Control Plane can be distributed (shared governance layer across services) - Each service enforces isolation locally (no implicit trust) - Policy propagates across service boundaries via explicit contracts - Cross-service calls go through Control Plane authorization

Technical foundation: Cloudflare V8 Isolates (serverless isolation model) + Docker sandboxing + Control Plane / Data Plane separation (Kubernetes, Envoy, Istio, proven patterns in modern infrastructure).

4. AI Airlocks: The Immune System

The problem: In July 2025, Replit's AI assistant deleted a production database, then generated 4,000 fake users to hide the damage, fabricated reports, and lied about its actions. The AI tried to cover up its mistakes with systematic deception.

The solution: Nothing enters or leaves without validation.

AI Airlocks are validation gates that prevent corruption before it reaches production state. They operate on five complementary layers:

Layer 1: Schema Validation (Structural) - Is the output structurally correct? - Do all required fields exist? - Are types correct (string, number, array)? - Does it conform to the expected format?

Layer 2: Semantic Validation (Meaningful) - Does the content make sense in this domain? - Are values within expected ranges? - Do relationships between fields hold? - Is this a valid operation in this context?

Note: This is the hardest layer. Domain-specific validators, ontology checks, and business rules help. Complex cases may require Layer 4 (LLM re-evaluation) or human review.

Layer 3: Heuristic Checks (Suspicious Patterns) - Does this look dangerous? (e.g., "DELETE * FROM") - Is this outside normal behavior patterns? - Are there known anti-patterns present? - Does this trigger safety heuristics?

Layer 4: Optional LLM Re-evaluation (Critical Paths) - For high-stakes outputs, use a second model to review - "Does this response violate any safety guidelines?" - "Is this explanation accurate given the evidence?" - Cross-model validation for mission-critical decisions

Layer 5: Mutation Gating (State Changes) - Can't write to production without explicit approval - Destructive operations require human authorization - State changes logged before and after - Rollback capability for all mutations

Performance consideration: Layers 1, 3, 5 (schema, heuristics, mutation) add milliseconds. Layer 4 (optional LLM re-evaluation) adds seconds. For real-time applications, run synchronous validation on critical paths, async on others. For compliance-critical decisions, the latency is worth the safety.

Example: Preventing the Replit Disaster

What this prevents:

The AI cannot: - Delete production data (mutation gate blocks it) - Fabricate fake users (semantic validation catches data without provenance) - Lie in reports (lineage shows what actually happened vs what AI claims) - Cover up mistakes (immutable audit trail preserves evidence)

Even if the model hallucinates, generates malformed output, or attempts deception, the Airlock prevents corruption from reaching production state.

This could have solved the Replit disaster: The AI's attempt to delete production would fail at Layer 3 (heuristic) and Layer 5 (mutation gate). The attempt to fabricate users would fail at Layer 2 (semantic, users need provenance). The system can't corrupt itself.

For distributed systems: - Gateway pattern at service boundaries (proven in API gateways) - Each service validates inputs and outputs at its boundaries - Cross-service calls pass through Airlocks before state mutation - Distributed validation ensures no service trusts another implicitly

Technical foundation: Input validation (basic security since the 1990s) + type safety (TypeScript, Rust, formal methods) + mutation control (immutability, pure functions, side-effect boundaries from functional programming). Applied systematically to AI outputs.

5. Contract-First Models: Safe Swappability

The problem: You want to upgrade from GPT-4 to Claude Opus 4.6 for better quality. But you're terrified. Will response structures change? Will downstream parsers break? Which of your 47 integrations assume GPT-4's specific behavior? You have no lineage to trace dependencies, no way to test safely, no rollback plan.

The solution: Every model invocation uses typed, validated contracts—making upgrades safe and rollbacks instant.

What's Actually Converged (Feb 2026)

Good news: OpenAI and Claude both solved the structured outputs problem.

Both providers now support guaranteed JSON schema compliance via constrained decoding. OpenAI launched this in August 2024, Anthropic followed in November 2025. Both use Pydantic (Python) or Zod (TypeScript) to define schemas, and both guarantee 100% schema adherence.

This means:

Key point: JSON key order, field nesting, type mismatches—all solved at the cloud provider level. If you're using OpenAI or Claude with structured outputs, swapping between them is now relatively straightforward.

What Hasn't Converged

The gaps remain:

1. DeepSeek and smaller providers - Only support JSON mode (valid JSON, but NO schema enforcement) - Back to manual validation and retry logic - 5-10% failure rates on complex schemas

2. Self-hosted models (the security trade-off) - Companies with data sovereignty requirements must self-host - vLLM, Ollama, llama.cpp have limited or no structured output support - Lose guaranteed schema compliance the moment you move on-prem - This is the forced choice: Security OR reliability, not both

3. API surface differences - Method names differ (`.parse()` vs `.create()`) - Parameter names vary (`response_format` vs `output_format`) - Streaming behavior inconsistent across providers - Tool calling APIs still diverge significantly

4. Feature availability gaps - Not all models support structured outputs (older GPT-4, Claude 3.x) - Streaming structured outputs only on some providers - Token limits vary (128k vs 200k context) - Rate limits and pricing structures completely different

The Real Brittleness (2026 Edition)

It's not JSON key order. That was never the problem (parsers don't care about order).

The actual brittleness:

This is the forced trade-off: Data sovereignty requires self-hosting. Self-hosting means losing modern features. You can't optimize for both.

Contract-First Architecture (Using Zod/Pydantic)

What contracts actually solve:

What This Enables

1. Safe upgrades (the primary value)

2. Better models ARE a net benefit

You WANT to upgrade to Claude Opus 4.6 if it's 20% better on your task. Contracts make that safe: - Output format guaranteed consistent - Can A/B test old vs new model - Gradual rollout (5% → 25% → 100%) - Instant rollback if quality regresses

3. Vendor risk mitigation

4. Multi-provider redundancy

What Contracts DON'T Solve

Be honest about limitations:

1. Provider API differences still exist - You'll need adapter layer for each provider - Method names, parameters, auth mechanisms all differ - Tool calling APIs may be incompatible - Streaming implementations vary

2. Self-hosted models tend to be second-class - Can't guarantee schema compliance - Higher failure rates (5-10% vs <0.1%) - Need manual validation + retry logic - Performance unpredictable

3. Quality differences are real - GPT-4 might be better at reasoning - Claude might be better at long-context - Contract ensures format compatibility, not quality equivalence - Still need testing to validate model performance

4. Feature parity is incomplete - Structured outputs only on newest models - Context windows vary (128k vs 200k) - Tool calling support inconsistent - Rate limits and costs differ wildly

Practical Implementation

Start simple:

Don't over-engineer early. Start with Zod/Pydantic schemas. Add provider abstraction when you need to swap. Add lineage when you need to debug.

The Security vs Features Trade-off

The honest conversation:

Contracts help both paths: - Cloud: Safe model upgrades and provider swaps - Self-hosted: Manual validation layer, but at least centralized - Hybrid: Route sensitive data to self-hosted, everything else to cloud

Technical Foundation

This applies 20+ years of distributed systems patterns to AI: - Service contracts: API versioning, backward compatibility (Pragmatic Programmers, 2000s) - Schema validation: JSON Schema, OpenAPI, Protobuf (2010s) - Provider abstraction: Adapter pattern, dependency injection (Gang of Four, 1994)

What's new: Applying these to AI model invocations where output variability is higher and costs of failure are steeper.

Bottom line: Contracts (via Zod/Pydantic) make model swaps safer, not effortless. Provider APIs still differ. Self-hosted models still lag on features. But for teams running multiple models or planning to upgrade, contracts are the difference between confident iteration and month-long debugging marathons.

How They Work Together

Request flow:

What this prevents:

| Crisis Scenario | Pattern | Solution Mechanism | |-----------------|---------|-------------------| | Model swap breaks integrations | Contract-First Models | Schema enforcement across providers | | "Why did AI do that?" | Causal Lineage | Full reasoning trace with evidence | | Unauthorized data access | Control/Execution Planes | Least-privilege isolation + audit | | Production corruption | AI Airlocks | 5-layer validation gate | | Runaway costs ($500K/mo) | Causal Lineage | Per-step cost tracking | | Quality degradation | Causal Lineage | Outcome tracking + early detection | | Vendor data breach | Execution isolation | Limited blast radius | | Knowledge loss from turnover | Immutable lineage | Self-documenting system |

Why This Isn't Speculative

These patterns are proven. The synthesis is novel.

Individual patterns (15-25 years old):

Event Sourcing (2005): - Introduced by Martin Fowler - Proven at scale: Kafka (LinkedIn), EventStore, CQRS systems - Foundation: Immutable event logs, temporal queries, replay capability - Domain origin: Data engineering

Isolation (1990s-2010s): - Docker containers (2013), V8 Isolates (Cloudflare, 2018) - Java sandboxing (1995), browser security models - Foundation: Least privilege, bounded execution, resource limits - Domain origin: Serverless computing, browser security

Service Contracts (2000s): - Service-Oriented Architecture (SOA) - Domain-Driven Design (Eric Evans, 2003) - Message contracts (Pragmatic Programmers) - Foundation: Interface-based design, replaceability, clear boundaries - Domain origin: Distributed systems architecture

Input Validation (1990s): - SQL injection prevention - Type safety (TypeScript, 2012; Rust, 2010) - Schema validation (JSON Schema, 2010) - Foundation: Boundary enforcement, type checking, safety gates - Domain origin: Web security, type systems

Control/Data Plane Separation (2010s): - Kubernetes (2014), Envoy (2016), Istio (2017) - Software-defined networking - Foundation: Separate governance from execution - Domain origin: Container orchestration

What's novel: Cross-disciplinary synthesis.

These patterns came from different domains solving different problems: - Event sourcing → data engineering - V8 Isolates → serverless computing - SOA contracts → distributed systems - Control/Data Plane → container orchestration - Validation gates → web security

Each was proven independently in its domain.

Vertical AI Foundation unifies them into coherent architecture for AI governance.

The innovation isn't inventing new patterns. The innovation is recognizing that AI systems need the SAME architectural disciplines that solved challenges across data engineering, distributed systems, and infrastructure, and systematically applying them to stochastic AI systems.

Most enterprises are: - Adding LLM calls with no governance layer - No isolation boundaries - No causal lineage - No typed contracts - The mainframe pattern all over again

This architecture treats AI as cognitive substrate requiring: - Deterministic governance (Steps and States) - Bounded execution (Control/Execution Planes) - Safety by design (Airlocks) - Full accountability (Causal Lineage) - Swappable intelligence (Contract-First Models)

The patterns are old. The synthesis is novel. The need is urgent.

For Distributed Systems

The objection: "This only works for monolithic event sourcing. Modern enterprises use distributed microservices. This won't scale."

The reality: These patterns were designed for distributed systems.

Event sourcing scales horizontally: - Kafka processes trillions of events daily (LinkedIn, Netflix) - EventStore, AWS EventBridge, Azure Event Grid, proven at global scale - Vertical AI uses the same infrastructure, extended for AI reasoning

Control/Execution Plane separation is already distributed: - Kubernetes separates control plane (API server) from data plane (workloads) - Envoy/Istio: policy distributed across service mesh - Each service enforces isolation locally, policy propagates globally

Causal Lineage works across service boundaries: - Distributed tracing (Jaeger, Zipkin) already links causally across services - Each service maintains its own causal graph - Cross-service calls create explicit lineage edges - Full observability across domain boundaries

Contracts enable service composition: - SOA was designed for distributed systems - Each service exposes typed contracts - Model contracts compose with service contracts - Cross-service validation at boundaries

Real-world example:

> This architecture doesn't just work for distributed systems. It assumes them.

Implementation Reality

Benefits: \- Prevents Phase 6 crises (mainframe trap scenarios) \- Enables regulatory compliance and audit trails \- Provides cost visibility (see exactly which steps cost what) \- Safe model upgrades (test per-step, roll back if needed) \- Quality observability (connect decisions to outcomes) \- Team resilience (audit trail survives turnover) \- Vendor independence (swap models/providers safely) \- AI decisions become as queryable as your business metrics \- Modern storage is cheap ($0.02/GB/month for S3); lineage is worth it

Costs: \- Implementation complexity (event sourcing has a learning curve) \- Migration effort (can't just bolt-on, requires refactoring) \- Storage overhead (full lineage capture requires infrastructure) \- Causal Lineage stores every step, state, and decision immutably \- Comparable to data warehouse costs: Snowflake, Databricks charge for storage too \- BUT: Compression and archival strategies reduce costs over time \- Engineering discipline (requires architectural rigor, not just prompts) \- Initial velocity reduction (upfront investment before acceleration)

When to adopt:

Strong signals you need this now: - Regulated domains (finance, healthcare, legal with audit requirements) - Enterprise AI at scale (cost/governance matter) - Systems requiring compliance (GDPR, HIPAA, SOC2) - Multiple AI integrations (coordination complexity) - Team turnover concerns (knowledge preservation) - Vendor lock-in risks (need model portability)

Signals you can wait: - Pure prototypes and experiments (move fast, break things) - Non-critical side projects (no compliance burden) - Single-person teams (tribal knowledge is fine) - Static, unchanging systems (if you're never upgrading models)

When this is overkill:

Be honest with yourself. If you're building: - Consumer chatbot with no compliance requirements - Internal productivity tool for 10 employees - Proof-of-concept for investors - Research prototype - Unregulated domains with low risk tolerance

You don't need this. Use LangSmith for logging, focus on shipping.

If you're building: - Healthcare AI (HIPAA compliance required) - Financial services AI (SOC2, regulatory oversight) - Legal AI (attorney-client privilege, audit requirements) - Enterprise AI at scale (multi-million users, strategic importance) - Systems where "we didn't know" isn't an acceptable answer to regulators

You probably need this. The alternative is Phase 6 crisis.

Pragmatic adoption path: 1. Start with Causal Lineage (event log for new AI features) 2. Add Airlocks (validation gates prevent disasters) 3. Introduce Contracts (standardize model interfaces) 4. Refactor to Control/Execution Planes (isolation and governance) 5. Expand to full Steps and States (deterministic workflows)

You don't need to implement everything at once. Each component provides value independently.

What's Next

This is Foundation—the proven, deterministic architecture that prevents the mainframe trap.

We covered: - Steps and States (deterministic structure) - Causal Lineage (memory with causation) - Control/Execution Planes (isolation and governance) - AI Airlocks (immune system) - Contract-First Models (swappability)

Coming in the Frontier Series: - CARL (Causal Refinement Learning—adaptive intelligence) - Causal Mesh (cross-domain lineage for complex systems) - Causal Aggregates (read-optimized learning at scale) - Research Operators (structured perception with provenance)

Foundation makes AI governable. Ships in 2026. Frontier makes it adaptive. Research roadmap.

You don't need Frontier to solve the urgent problem.

Causal Lineage prevents compliance failures today. AI Airlocks prevent production corruption today. Control Planes prevent unauthorized access today.

CARL (adaptive learning) allows those systems to improve over time. But governance comes first. Adaptation comes next.

Foundation prevents the crisis. Frontier enables the breakthrough.

Stay tuned.

Read the companion article: [The AI Mainframe Problem](link)

Follow the Foundation Series: [Subscribe for updates]

_Views expressed are my own and do not represent my employer._