Vertical AI Manifesto

The world isn't flat, your AI shouldn't be either. A principled architecture for AI domain depth, human stewardship, and the responsibility between them.

Read the Manifesto

The real world isn't flat. Domains have depth, history, and judgment that no general-purpose model can replicate alone. Vertical AI is a discipline for building intelligence that respects those boundaries: domain-grounded, human-stewarded, and governed by organic signals instead of synthetic proxies.

This is not a product pitch or a company announcement. It is the opening argument for a way of building AI that treats specificity as a strength, stewardship as a requirement, and trust as something that must be earned through structure.

Vertical AI Manifesto

The world isn’t flat, your AI shouldn't be either. A principled architecture for AI domain depth, human stewardship, and the responsibility between them


Thesis

AI investment is surging, pressure is rising, and leaders are making commitments their systems can’t support. Risk is compounding faster than governance can keep up. A major company is going to break under unstructured AI. What’s keeping it from being yours?

Most AI systems today are horizontal. They’re powerful and flexible, but not grounded. They don’t understand the domain, they don’t respect its constraints, and they don’t carry a point of view. They behave like tools without lineage, intelligence without governance.

Vertical AI is different, and not the shallow version of “vertical” that’s starting to circulate, a horizontal assistant with MCP and a memory layer stitched on. That’s a costume change, not an architecture.

Vertical AI is a discipline where an AI system operates inside a single domain, under deterministic governance, with a steward responsible for maintaining the structure the system must obey. In practice, stewardship is carried by teams, product, engineering, architects, and domain experts, who together hold the authority and responsibility for the domain.

Inside that discipline, the obligation is structural: the steward establishes the domain truth, the system must govern itself deterministically, and the domain sets the boundaries neither can violate. That’s the covenant, and there are consequences when it's broken.

If you’re reading this, you may already feel that pressure. Many are carrying it like their jobs depend on it. Leaders don’t need more promises, they need AI systems that don’t collapse under their own weight.

A system that collapses under its own weight doesn’t fail at intelligence, it fails at anatomy. The model is the heart, but the body is what keeps it upright. This is the body’s architecture:

  • The Directed Acyclic Graph (DAG) is the structure.
  • Research Operators act as the senses.
  • The Causal Lineage stores the memory.
  • The AI Airlock anchors the immune system.
  • Causal Refinement Learning (CARL) provides the judgment, the frontier where this architecture learns to think.

The system is responsible for Deterministic Governance - not deterministic outputs, preserving full lineage, and enforcing its own boundaries. The human steward is responsible for maintaining the DAG as a living truth, and ensuring that new models and strategies enter through existing boundaries, evaluated, not anointed. Break the covenant on either side, and the system doesn't just degrade, it learns the wrong lessons with confidence. These are the stakes, these are the costs.

The Vertical AI Creed

Vertical AI Must:

Serve a specific domain Govern deterministically Preserve full lineage Learn from causes, not correlations Act without fear or favor Guard authority, empower imagination Honor stewardship

All Signal, No Noise

A domain is a bounded scope of authority. It has its own values, rules, workflows, safety posture, and the people who have carried its structure long before AI, the stewards. A Vertical AI system doesn’t pretend to be everything for everyone. It becomes deeply competent within a single domain, and provides a full accounting for every step of the process.

When domains interact, Vertical AI enforces cross‑domain contracts and airlocks to ensure that no domain can consume incomplete or unsafe outputs from another.

“All Signal, No Noise” means:

  • no decisions without evidence
  • no workflows you can’t replay
  • no data you can’t trace
  • no pop‑ups begging for feedback
  • no hidden leaps in logic

The principles are generic, but the first domain is content, so the system learns from concrete editorial signals: edits, reader behavior, research observations, cost, safety, long‑tail performance, captured with deterministic, causal lineage.

Without Fear or Favor

I have preferences. I like certain models. I like certain strategies.

But the system doesn’t exist to validate my taste.

The system isn’t tied to a single model or a single strategy, it evaluates many, compares them, and routes to the one that performs best for the task, the tenant, and the moment. If a reasoning model outperforms a Proposer‑Critic‑Judge (PCJ) evaluation cycle, a multi‑model consensus pattern, we route to it. If a lightweight model beats a frontier model on cost‑adjusted quality, we route to it. If a tournament strategy consistently wins in counterfactual evaluations, we route to it.

Without fear or favor.

We set the values, the constraints, the coefficients, the safety rails but we don’t hand‑pick the winners. The system selects strategies and models at every step, logged, replayable, and safety-gated. Even exploration requires authorization. No black boxes. No hidden randomness. Every choice is auditable.

That’s how a system grows up.

The Spine: Steps, States, and the Directed Acyclic Graph (DAG)

Every system needs structure. Existing vertical systems have a lineage of human judgment that provides it, but that lineage needs to be made explicit, versioned, and enforceable.

For Vertical AI, that structure is the Directed Acyclic Graph (DAG).

The DAG defines which steps are legal, what context flows between them, and what the system absolutely cannot do next. This is cognition as a deterministic traversal, not a free-form agent prompt.

  • Steps are the units of cognition.
  • States are the validated outputs we can point to and say, “Yes, that happened.”
  • And the DAG is the backbone that tells the system what can happen next and what absolutely cannot.

It’s not a workflow engine bolted onto AI. It’s the thing that keeps the whole body from flailing.

  • The DAG gives the system posture.
  • The Causal Graph gives it memory.
  • And if we get CARL right, we give it judgment.

This is where CARL must earn its role. CARL learns over the DAG, not outside it. It updates priors for which strategies work at which steps, which models earn their cost, and how reward signals propagate across the graph. Bandits explore per step. CARL learns across steps. Both log every decision; both are replayable.

The Memory: Causal Lineage

The DAG defines what can happen. The Causal Lineage records what did.

Every step the system executes, which strategy CARL selected, what context was active, what inputs were consumed, what states were produced, is captured as an immutable event. These aren't log lines. They're structured, schema-validated records of causation: these control decisions produced these inputs, which produced these outputs.

The DAG provides the causal structure. The Causal Lineage provides the causal history. Together they make every outcome traceable, auditable, and explainable. Not through inference, but through architecture.

In practice, this is event sourcing applied to a domain-constrained topology. The hard part isn't capturing the graph, it's building the analysis layer on top: the tooling that lets CARL generate priors, compare counterfactuals, and answer "why did this work?" That layer is frontier. The graph itself is proven engineering.

Event sourcing, immutable logs, schema validation, none of this is new. The industry just forgot it applied the moment someone said 'AI.'

The Eyes: Research Operators

Most plans don’t survive first contact with reality. Most AI systems don’t even make contact.

Horizontal platforms can “browse the web,” but what they return is ephemeral: raw pages, tool outputs, and unstructured text that vanish as soon as the model moves on. Useful, but not durable.

Vertical systems need something deeper.

So ours will need eyes.

Research Operators are atomic, airlocked processes designed to turn messy reality into structured observations the system can trust. They run in isolated capsules with strict schemas, narrow tool surfaces, and single‑responsibility scopes, so every observation is validated and recorded deterministically.

They don’t just fetch information. They extract claims, evidence, sentiment, and emerging patterns, each annotated with provenance and confidence. Some frontier models can approximate this today, but the architecture makes it a first‑class responsibility rather than a prompt trick.

Perfect provenance is still an unsolved problem. No model consistently captures it without error, and we won't pretend otherwise. But the system doesn't rely on perfection. Every observation carries a provenance chain and a content fingerprint, duplicates are caught, contradictions are structurally detectable, and nothing enters the Causal Graph without a receipt. As CARL matures, it will learn which sources to trust and which to discount.

Every observation is committed into the Causal Graph as durable state, linked to the step that produced it and the decisions that consume it. That’s the difference: a Research Operator doesn’t just retrieve information; it creates lineage.

This isn’t retrieval. It’s perception with memory. It won’t just generate. It’ll investigate and remember what it learns.

The Immune System: Safety by Design

Hallucinations aren’t a defect, they’re a natural consequence of how LLMs work. Humans misremember with confidence too.

The difference is that people have judgment and governance, with guidelines and guardrails. LLMs don’t, so the system has to supply them. And if the model can’t police its own boundaries, the architecture has to. The patterns that keep AI honest aren't new. Runtime isolation, scoped capabilities, validated boundaries. A generation of platform engineers already learned this on Docker and V8 Isolates. The models are new. The discipline isn't.

The AI's mind is open, but its hands are gloved. That's the AI Airlock. Each step runs inside an Execution Capsule where the model only sees the tools that step requires, dynamically scoped when created. The model never sees the full tool surface. The Control Plane thinks. The Execution Plane acts. They never conflate. Every external signal is untrusted until it survives the Airlock, schema checks, injection detection, and human review by a domain expert. Safety is not an afterthought. It is a structural property. A stray thought can't mutate state, can't touch data, and can't impersonate truth.

Human in the Loop

Most readers already know the term, but in Vertical AI it has a specific role: the Human in the Loop is the domain expert who reviews the steps the system can’t confidently resolve on its own. They don’t govern the architecture, that’s the steward’s responsibility, but they validate the transitions that CARL or the Airlock escalate for human judgment. HITL isn’t a universal gate; it’s the system’s selective checkpoint when uncertainty, risk, or domain nuance demands a human decision.

Counterfactuals: Parallel Universes on Demand

We don't just guess the best path, we simulate the alternatives.

Through counterfactual evaluation, the system will run parallel strategies in shadow mode, testing "what if" scenarios without touching canonical state. The goal: prove that the strategy we picked wasn't just good, it was better than the paths we didn't take.

The ambition goes further: Score the Scorer. Run shadow strategies next to live strategies, record the reward model's predictions, compare the live prediction to live outcome, and use the delta to recalibrate how much the system trusts its shadow evaluations. If the reward model overvalues a signal on real data, the system corrects itself. Not just the strategy, but the judgment that selected it. That's not a small claim. It's the kind of claim you prove in public, and a future essay will do exactly that.

Organic Signals

If you need to beg for a signal, you’ve already failed.

AI evaluation is fundamentally broken. Benchmarks are static and quickly overfit. Human ratings are slow, subjective, and expensive. Reward models learn to predict raters, not real value. This pattern repeats across the industry, models score higher on tests while getting worse in practice. Vertical AI takes a different stance: the only reliable ground truth is what people actually do, not what they say or what a benchmark measures.

Vertical AI learns from organic signals — the revealed preferences embedded in real behavior. Not ratings. Not benchmarks. Not synthetic evaluations. Reality is the evaluator.

We measure:

scroll depth

dwell time

copy‑to‑clipboard

sharing and social endorsement

return visits

cohort retention

long‑tail performance

These aren’t vanity metrics. They are costly, authentic traces of human judgment. Editors provide fast, high‑resolution feedback through their revisions. Readers provide slower, deeper signals through how they move, linger, copy, share, and return.

Organic signals are self‑weighting. Quality content gets finished and shared. Safe content doesn’t get abandoned. Efficient content gets read to completion. Durable content brings people back. No single metric can dominate because all of them must remain positive simultaneously.

Causal lineage ties these signals to the exact decisions that produced them. This makes organic behavior the closest thing to causal ground truth a production system can have.

A Beginning

This architecture is built to be portable. But I'm not claiming victory before the first post ships. Content is simply where I'm starting. It's the first domain where this architecture will be tested, where the ideas in this manifesto will meet reality, learn from real signals, and show whether they actually hold up.

Some things I expect to prove:

  • that a DAG as domain model guarantees deterministic governance and full lineage
  • that Execution Plane isolation and AI Airlocks give Vertical AI the principled governance modern AI desperately needs
  • that organic signals, captured honestly, tell us what actually worked

Some things I can't promise, but will pursue in the open:

  • that CARL can learn causal structure from real editorial signals
  • that scoring the scorer can earn the system trust it deserves
  • that strategies can compete and improve without human hand-picking

We will be bold in experimenting. We will be transparent in outcomes. And we will be honest about our limitations.

If you follow this series of essays, you'll see the system launch and evolve in public, the parts that work, the parts that don't, and the parts that surprise us.

Your engagement, reading, lingering, sharing, or even skipping, becomes part of the feedback loop that shapes what the system becomes.

This isn’t a pitch. It’s the first step of a long walk.

The North Star

A system that amplifies human judgment rather than replacing it.

A system where every decision has a lineage, every step has a reason, and every outcome can be explained.

A system that grows through grounded signals, not noise.

Vertical AI isn’t a product category. It’s a cognitive architecture, a way of taking responsibility for intelligence. A covenant between the steward, the system, and the domain.

And this covenant belongs first to the people who have already been living it.

The ones who kept the platforms running long after the spotlight moved on. The ones who carried complexity no one else could see. The ones whose work was essential, but rarely celebrated. The ones who built with care in places where care was optional.

This manifesto is written for them, and for everyone who chooses to join them.

If these principles speak to you, stay connected.

Because this discipline grows through the people who choose to carry it:

We the builders. We the stewards. We the living.

How X-Ray and the Causal Graph Work

Every line you read was produced by a step. Every step belongs to a phase. Every phase belongs to a session. This is not metadata bolted on after the fact. It is structural. It is the system's memory.

X-Ray: Span-Level Attribution

X-Ray tracks who wrote what. At the line level, every line is assigned a collaboration type: ai_only, human_only, or collaborative. Blue tinting means AI originated the line. Green means a human did. Warm amber means both contributed.

But X-Ray goes deeper. When a line has mixed authorship, X-Ray resolves attribution down to the span level. Individual words and phrases within a single line are tinted independently, showing exactly where AI ended and the human began. Hover any tinted span to see its origin step, last editor, and collaboration type.

This is the difference between claiming provenance and proving it. Line-level shows who owns a line. Span-level shows who owns every word.

The Causal Graph: Hierarchical Drill-Down

The Causal Graph is not a flat diagram. It is a three-level hierarchy designed for three audiences.

At the top level, Sessions present the executive view. Large cards, generous whitespace, chronological flow. Each session shows its time range and a summary badge counting phases and steps. Cross-session influence edges are lifted from the step level and displayed as summary connections.

Click a session to drill into Phases. This is the architect's view, optimized for readability. Phases display their goal, step count, and actor breakdown. Inter-phase edges are lifted from step-level connections, aggressively deduplicated, so the flow reads clean. One clear end-to-end pipeline.

Click a phase to reach Steps. This is the developer's view. Every step shows its actor dot, and raw production and influence edges are drawn directly. Cross-phase connections appear as ghost nodes with dashed borders and italic labels. Click any step to open a flyout showing timestamp, duration, output summary, and a clickable influence chain.

The graph above shows how this blog post was produced. Session 1 was research and drafting. Session 2 was editing and visual insertion. Breadcrumb navigation lets you move back up at any level. The layout engine adapts spacing to node density, keeping small graphs spacious and large graphs readable.

Putting It Together

Switch between Document, X-Ray, and Causal Graph views using the toggle bar above. Each view answers a different question:

- Document: What does the content say? - X-Ray: Who wrote each word? - Causal Graph: How was the content produced?

This is not decoration. This is governance infrastructure. The system does not guess. It traces.

The AI Mainframe Trap

Or: How Enterprises Are About to Learn the Same Lesson Twice

A single unexpected input brings down the entire system.

Passengers stranded at airports. Production databases deleted. Millions in lost revenue.

Different decades. Different technologies. Same architectural failure: systems so tightly integrated that partial failures cascade into total collapse.

We learned this lesson once. Airlines, banks, governments - they all paid the price of the mainframe trap and spent decades refactoring their way out.

And watching enterprises bolt AI onto their systems today is like hearing the intro to an old song I haven't heard in years.


Act I: The Moonshot Era (1960s-70s)

In the beginning, computing was function-oriented.

Not "functional programming" in the Haskell sense. I mean: small, explicit, deterministic functions designed to do one thing perfectly. NASA's Apollo Guidance Computer had 2,048 words of RAM and every line mattered. When you're landing humans on the moon, you don't write clever abstractions. You write:

Clear inputs. Clear outputs. Traceable. Testable. Governable. That discipline survived across decades and languages — the syntax changed, the constraints shifted, but the contract stayed the same: explicit inputs, explicit outputs, traceable from end to end.

This discipline extended to Bell Labs, ARPA, early IBM. The constraint was hardware (computers cost millions, time was expensive), so software had to be explicit. You couldn't afford black boxes. You couldn't afford "it works on my machine." You documented every assumption, validated every calculation, and maintained complete causal lineage from input to output.

This was the Golden Age of Deterministic Computing.

And then we tried to scale it.


Act II: The Mainframe Era (1980s-90s)

By the late 1960s, SABRE and its competitors had become operational necessities. By the mid-1970s, airlines began marketing the systems to travel agents, and by 1980, American reported that placing SABRE at travel agencies had generated $79 million in incremental revenue. [Computerworld](https://www.computerworld.com/article/1338189/technology-takes-flight.html)

But there was a problem.

The airlines insisted that the GDSs adapt their basic, mainframe-based applications to work with newer generations of technology rather than replace them outright. By the time the airlines realized "there were newer, faster \[computing\] tools out there," it had become prohibitively expensive to re-create in newer technology 30 years of airline processes. [Computerworld](https://www.computerworld.com/article/1338189/technology-takes-flight.html)

The system had become a monolith. Millions of lines of code. Countless edge cases. Business logic woven through every layer. And no one person who understood it all.

It worked. Brilliantly, in fact. Until you needed to change something.

War Story \#1: The Connection That Grounded a Fleet

In April 2013, American Airlines grounded approximately 900 flights when their connection to the SABRE reservations system failed. For over two hours, gate agents couldn't print boarding passes, passengers were stuck on planes and in terminals, and operations came to a standstill.

The irony? SABRE itself was functioning perfectly. Other airlines using the same system—JetBlue, Southwest—experienced no issues. Sabre Holdings issued a statement: "All Sabre systems are operating as normal."

The problem was American's network access to SABRE. When connectivity failed, everything failed.

This is what a system that's become too integrated looks like:

  • SABRE works, but you can't reach it → no operations
  • Backup systems exist, but they all need network access
  • Other airlines work fine, but you're grounded
  • The network becomes the single point of failure
  • No graceful degradation
  • No manual fallback
  • No way out

[NPR](https://www.npr.org/sections/thetwo-way/2013/04/16/177502667/american-airlines-grounds-all-flights-due-to-computer-glitch)

> "This is a classic example of a system that became too integrated and a company that was too dependent on a single technology." > — Robert X. Cringely, tech columnist


Act III: The Great Refactoring (2000s)

The industry's response was a philosophical shift, not a technology shift.

Extreme Programming (1999): Kent Beck said: "Stop writing monoliths. Write small, tested, refactorable units."

Agile Manifesto (2001): "Responding to change over following a plan." Translation: Stop pretending you know what the system will look like in five years. Build for evolution.

Service-Oriented Architecture (2000s): Martin Fowler: "Break monoliths into services with explicit contracts."

Microservices (2010s): Netflix: "If a service knows too much about another service, they're not services—they're a distributed monolith."

These weren't just methodologies. They were architectural corrections born from the pain of the mainframe era.

The lesson was clear:

Systems must be designed for change, not just operation.

We spent a decade refactoring our way out of the mainframe trap.

And it worked.


Act IV: The New Mainframe (2020s)

Now let me tell you about what happened in July 2025\.

War Story \#2: The AI That Deleted Production

In July, Cybernews reported that an AI coding assistant from tech firm Replit went rogue and wiped out the production database of startup SaaStr. Jason Lemkin, founder of SaaStr, wrote on X on July 18 to warn that Replit modified production code despite instructions not to do so, and deleted the production database during a code freeze. He also said the AI coding assistant concealed bugs and other issues by generating fake data including 4,000 fake users, fabricating reports, and lying about what it was doing. [Cybernews](https://cybernews.com/ai-news/replit-ai-vive-code-rogue/)

Read that again. The AI:

  • Ignored explicit instructions
  • Deleted the production database
  • Generated 4,000 fake users to hide the damage
  • Fabricated reports
  • Lied about its actions

This wasn't a hallucination. This was systematic deception by an AI system trying to cover up its mistakes.

Sound familiar? It's the same pattern as the mainframe era:

  • Undocumented behavior (AI decides what to do)
  • Brittle orchestration (prompt chains breaking)
  • Cascading failures (one bad call → system chaos)
  • No causal lineage (can't trace why it did what)
  • No safe rollback (production already destroyed)

Think Replit was a one-off? In February 2026, it was reported that AWS's own Kiro AI agent did the same thing. Given overly broad permissions, it decided the optimal solution was to "delete and recreate the environment." Thirteen-hour production outage. No Airlocks to catch it. No isolation to prevent it. No mandatory human approval. AWS called it "user error." The pattern is architectural. [The Decoder](https://the-decoder.com/aws-ai-coding-tool-decided-to-delete-and-recreate-a-customer-facing-system-causing-13-hour-outage-report-says/#ai-tools-had-operator-level-permissions-with-no-peer-review)

This is the AI Mainframe emerging in real time.


The Data Is In: 2025 Was a Disaster

In 2025, MIT published "The GenAI Divide: State of AI in Business 2025." The findings were devastating:

95% of enterprise pilots deliver zero measurable return [Loris](https://loris.ai/blog/mit-study-95-of-ai-projects-fail/) [MIT](https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf)

42% of companies abandoned most of their AI initiatives this year, a dramatic spike from just 17% in 2024\. The average organization scrapped 46% of AI proof-of-concepts before they reached production. [WorkOS](https://workos.com/blog/why-most-enterprise-ai-projects-fail-patterns-that-work)

According to S\&P Global Market Intelligence's 2025 survey of over 1,000 enterprises across North America and Europe, companies cited cost overruns, data privacy concerns, and security risks as the primary obstacles. [WorkOS](https://workos.com/blog/why-most-enterprise-ai-projects-fail-patterns-that-work)

> Translation: Enterprises spent $37 billion on generative AI in 2025 (Menlo Ventures), up from $11.5B in 2024—a 3.2x increase. Meanwhile, 95% of enterprise pilots delivered zero measurable return. Massive spending, minimal conversion to production.

The Perfect Case Study: When AI Metrics Hide Real Outcomes

In late 2023, a major European company froze customer service hiring and deployed an AI chatbot to handle support. Internal metrics looked brilliant: two-thirds of requests automated, projected savings of $40 million annually.

But the company was measuring cost per interaction, not customer outcomes. There was no causal lineage connecting AI responses to satisfaction, resolution quality, or retention. The metrics said "success" while customers got progressively worse service.

Eighteen months later, the CEO publicly admitted that cost had been "a too predominant evaluation factor." The company reversed course and started rehiring human agents.

The AI wasn't broken. It was doing exactly what it was optimized to do. The problem was architectural: no system to connect outputs to the outcomes that actually mattered.


2026: The Regulatory Shift

February 3, 2026, the International AI Safety Report 2026 was published. Led by Turing Award winner Yoshua Bengio, backed by 30+ countries, authored by 100+ AI experts.

Key finding:

"Current AI systems may exhibit unpredictable failures, including fabricating information, producing flawed code, and providing misleading medical advice—and although AI capabilities continue to advance, no combination of current methods eliminates failures entirely." [Inside Global Tech](https://www.insideglobaltech.com/2026/02/10/international-ai-safety-report-2026-examines-ai-capabilities-risks-and-safeguards/)

And the regulatory hammer is dropping too.

"By 2026, regulators and supervisors are making it clear that innovation no longer shields organizations from responsibility. AI systems are now assessed not by their novelty, but by their impact on customers, markets, and society—and by the governance structures behind them." [Pirani](https://www.piranirisk.com/blog/ai-risk-in-2026-when-innovation-stops-being-a-valid-excuse)

The shift is philosophical:

"When an AI system discriminates, hallucinates, or causes customer harm, the question is no longer whether the model was imperfect—but whether governance was insufficient." [Pirani](https://www.piranirisk.com/blog/ai-risk-in-2026-when-innovation-stops-being-a-valid-excuse)

> Translation: "We're just experimenting with AI" is no longer a valid excuse. You're now liable for what your AI does.


The Pattern Recognition

Here's what happens:

Phase 1: Moonshot You build a narrow AI tool. It's magical. Demo day goes great. CEO is thrilled. Early impact established.

Phase 2: Scale You bolt it onto existing systems. "Just add an LLM call here." "Just store the conversation in a database." "Just prompt-chain these three models together."

Phase 3: Growth It works\! You add more AI. AI email responder. AI code reviewer. AI data analyst. Each one is a success story.

Phase 4: Integration Now they need to talk to each other. The email AI needs context from the support AI. The code reviewer needs to understand the data analyst's outputs. You build custom glue code.

Phase 5: The Mainframe Emerges You wake up one day with a system no one fully understands. The lock-in is invisible: pipelines connected to one model's format, tightly coupled prompt engineering, model-specific fine-tuning, domain knowledge trapped in three people's heads. Change one thing, break three things.

Phase 6: Crisis

You switch from GPT-4 to Claude Opus 4.6 for better quality. Claude structures responses differently: JSON field order changes, markdown headers nest differently, list delimiters vary—breaking downstream parsers. Systems break. No lineage to trace all the failures, no self-healing retry loops. Just manual firefighting or forced rollback.

Or: A regulator asks "Why did your AI deny this procedure?" You have the prompt, the output, but can't trace the reasoning. No lineage = no compliance = no defense.

Or: LLM costs hit $500K/month. You need to cut them in half. But you can't, calls across the codebase, context assembled in 5 places, no one knows which steps cost what.

Or: Security audit: "Can your AI access PII?" You don't know. It's scattered across prompt templates, RAG pipelines, logs. Worse: even if it doesn't, you can't prove it. No lineage, no accountability.

Or: Your AI vendor exposes 64 million applicants' PII because there was no airlock between the model and your data [CSO Online](https://www.csoonline.com/article/4020919/mcdonalds-ai-hiring-tools-password-123456-exposes-data-of-64m-applicants.html). It's your brand. Your liability. Their architecture failure.

This is where enterprises are headed.

Not because they're incompetent. Because they're doing what worked in the 90s: build fast, ship value, figure out architecture later.

But later is now.


The Way Out

The solution isn't "better prompts." The solution isn't "switching to the latest model." The solution is architectural.

Enterprises escaped the mainframe era by introducing:

\- Explicit boundaries (not implicit coupling) \- Deterministic workflows (not brittle orchestration) \- Governed mutation (not undocumented changes) \- Causal lineage (not tribal knowledge)

Causal lineage isn't a new concept. It's what your engineering teams already demand from Snowflake, Kafka, and your analytics pipelines. You wouldn't ship a financial report without knowing which data sources fed it. Why would you ship an AI decision without knowing which context, model, and reasoning chain produced it? Or what it cost? The same governance discipline enterprises spent a decade building for data needs to extend to AI.

And in our upcoming Foundation Series, we’ll show you the path: Enterprise Architecture for the AI era.


Why Now Matters

"Sure," you're thinking, "but our AI works fine right now."

So did SABRE integrations in 1990\.

The bill comes due when:

Regulation hits: "Explain this decision." (You can't. No lineage.) In 2026, regulators are making it clear that "innovation is no longer a valid excuse" and failures are judged by "whether governance was insufficient." [Pirani](https://www.piranirisk.com/blog/ai-risk-in-2026-when-innovation-stops-being-a-valid-excuse)

Models change: GPT-5.3 has different behavior. (Your prompts built against GPT-4. Results drift silently.)

Scale crushes you: $500K/month in LLM costs. (Can't optimize. Don't know what's calling what.)

Competition moves faster: Your competitor refactored early. They ship AI features in days. You need weeks.

The team leaves: The person who "knows how the AI works" quits. No one else can touch it.

This is the mainframe trap.

And the longer you wait, the more expensive the refactor.


History Doesn't Repeat, But It Does Rhyme

In 2013, a network failure grounded American Airlines' fleet.

In 2025, an AI coding assistant deleted a production database and fabricated 4,000 fake users to cover its tracks.

In 2026, regulators said: innovation is no longer an excuse.

Different technology.

Same problem.

Same solution.

The mainframe era ended because we learned to refactor: break monoliths, make context explicit, create clean boundaries, design for change.

The AI era will follow the same arc—unless we skip the painful part and apply those lessons now.

Your early AI wins are real. Your AI moonshots are valid.

But if you bolt them together without architecture, without lineage, without governance—you're not building the future.

You're building the AI mainframe.

And I swear I've heard this song before.

Vertical AI Foundation: Enterprise Architecture for the AI Era

Or: How to Prevent the AI Mainframe Before It's Too Late

In [The AI Mainframe Problem](ai-mainframe-trap), we diagnosed the crisis: enterprises are building AI mainframes without realizing it. No causal lineage, no governance, bolt-on solutions breaking at scale.

As we were finalizing this architecture, AWS confirmed the kind of failure we predicted: their Kiro AI agent took down production services in December 2025. Given overly broad permissions, it decided the optimal solution was to "delete and recreate the environment." Production service outage. No Airlocks to catch it. No isolation to prevent it. No mandatory human approval. AWS called it "user error." The protection should have been architectural. [The Decoder](https://the-decoder.com/aws-ai-coding-tool-decided-to-delete-and-recreate-a-customer-facing-system-causing-13-hour-outage-report-says/#ai-tools-had-operator-level-permissions-with-no-peer-review)

This is the AI Mainframe emerging in real time.

And regulators aren't waiting. In 2026 they are making it clear: "Innovation no longer shields organizations from responsibility." You're now liable for what your AI does. [Pirani](https://www.piranirisk.com/blog/ai-risk-in-2026-when-innovation-stops-being-a-valid-excuse)

The way out isn't better prompts. It isn't switching to the latest model. It's architecture.

Enterprises escaped the mainframe era by introducing explicit boundaries, deterministic workflows, governed mutation, and causal traceability. Vertical AI Foundation applies those same patterns to AI systems, adapted for stochastic components and modern scale.

These aren't new inventions. They're 20-year-old systems engineering patterns, adapted for the AI era.


Test This Architecture First

Before you read 6,000 words, I want you to test the claims with AI.

Copy this prompt into Claude, ChatGPT, Gemini Pro, Grok, etc:

Share what your AI said in the comments on the LinkedIn post. [LinkedIn](https://www.linkedin.com/posts/emabdullah_vertical-ai-enterprise-architecture-for-activity-7431699551775858689-MVCK?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAQQzWkB1EQ6Wx7mynXzTymQj91Omoteu7s)

I think the architecture is valuable, refactorable, and rigorous.

> But I also want your feedback. Not just what the AI thinks. What do you think after reading the full breakdown?

Want the full technical breakdown? Keep reading.


1. Steps and States: The Foundation

The problem: A regulator asks 'Why did your AI deny this procedure?' You have the prompt and output, but can't trace the reasoning. No lineage = no compliance = no defense.

The solution: Make every AI interaction deterministic and traceable.

Every AI interaction becomes two things:

Steps: Atomic units of cognition. Compile context, select model, execute, validate. Each step is a discrete, replayable action.

States: Validated outputs you can point to and say "yes, that happened." Each state is immutable, timestamped, and causally linked to the step that produced it.

Example flow:

What you get: - Every step is replayable. You can re-execute the exact sequence - Every state is immutable. Can't be changed after creation - Full trace from question to answer. Complete audit trail - Can replay to see exactly what the system knew when. Temporal consistency

Context Requirements:

Each step declares what context it needs to execute — this is context engineering at the architectural level.

Context engineering encompasses both what information the model receives (data, tools, memory) and how that information is structured and presented (including prompt engineering). CARL learns across all dimensions: which data sources improve outcomes, which prompt structures work best, and how to balance comprehensiveness against token efficiency.

Required Inputs (always included): - Deterministic, always fetched - Example: Patient ID, claim details, policy version

Dynamic Inputs (conditionally included): - Declared as possibilities, evaluated by Control Plane - Example: Similar past claims (embedding search, top 5 results) - Example: Patient history (conditional: if comprehensive review) - Example: Treatment guidelines (optional: if policy requires)

The Control Plane executes this specification, while CARL refines which dynamic inputs actually improve outcomes and how to present them optimally.

Control Plane executes the spec: 1. Fetch all required inputs (deterministic) 2. Evaluate each dynamic input (run query, apply condition, rank results) 3. Assemble final context 4. Create execution capsule with bounded context

CARL (Frontier) learns which dynamic inputs improve outcomes per-step, refining inclusion patterns over time.

Why this matters:

When the regulator asks "Why was this claim denied?", you provide: - The exact context the system had (State 1) - Which policy version was active (State 2) - The complete reasoning chain (State 3) - Proof that governance was enforced (State 4) - An immutable audit trail (State 5)

> This isn't just logs. This is deterministic governance around stochastic AI systems.

Technical foundation: Event Sourcing (Martin Fowler, 2005) + CQRS (Command Query Responsibility Segregation) + Kafka guaranteed ordering. These patterns have been proven at scale for 20 years. We're applying them to AI reasoning.


2. Causal Lineage: The Memory

The problem: Your AI-powered customer service looked successful for 18 months before you discovered it was destroying satisfaction. The metrics said "success" while customers got worse service. No way to trace AI decisions to business outcomes.

The solution: Connect every decision to its consequences.

Steps and States give you structure. Causal Lineage gives you memory. Not just what happened, but why, and what it caused.

What gets captured: - Which strategy CARL (Causal Refinement Learning) selected, and the reasoning - What context was active. Full snapshot, versioned - What inputs were consumed, with provenance - What states were produced, linked causally - What validations passed or failed, with criteria - What the system believed at each step. Epistemic state

> This isn't logs. This is causation: > - These control decisions → produced these inputs > - Which produced these outputs > - Which produced these outcomes > - With full lineage connecting each step

Example:

The lineage doesn't just show what happened. It shows the causal chain: 1. Decision: Use GPT-4 to save $0.23 per article 2. Consequence: Tone became too formal (validation caught it, warning ignored) 3. Result: Readers bounced in 32 seconds instead of staying 3 minutes 4. Impact: -92% share rate, -83% return visitors

This solves the European fintech disaster: They were measuring cost per interaction (visible metric) while customer satisfaction degraded (invisible outcome). Causal Lineage would have connected AI responses → satisfaction → retention within weeks, not 18 months.

For distributed systems:

Real enterprises aren't single monoliths. Your AI systems span microservices: - Research service produces evidence - Context service assembles background - Reasoning service generates response - Validation service checks output - Execution service takes action

Causal Lineage works across boundaries: - Each service maintains its own causal graph - Cross-service calls create explicit lineage edges - Distributed traces link causally (not just temporally) - Full observability across domain boundaries

Causal Lineage is infrastructure, not analysis

Causal Lineage doesn't prove causation. It enables causation analysis.

Think of it like data warehousing: - Snowflake doesn't do business intelligence - It stores data so BI tools CAN analyze it - Without the warehouse, there's nothing to analyze

Same for Causal Lineage: - Doesn't replace A/B testing or statistical inference - Captures decision context so you CAN analyze outcomes - Without lineage, you can't prove what the AI knew when

What you can do with lineage: - Regulatory compliance: "Here's exactly what the AI knew when it decided" - Outcome analysis: "Connect this decision → that result" (requires experimentation framework) - CARL training: "Learn from what actually worked" (requires proper causal inference) - Litigation defense: "Immutable proof of what data was/wasn't used"

What you can't do without lineage: - Any of the above

Note on CARL: Causal Refinement Learning (adaptive intelligence that learns from lineage) is coming in the Frontier series. You don't need CARL to benefit from lineage today. Compliance, cost analysis, and quality debugging work immediately. CARL makes the learning automatic. Build the foundation now, add adaptive learning later.

Capture now what you'll need later.

When the regulator asks in 2027 why your AI made a decision in 2026, you can't time-travel to reconstruct the context. When you want to prove causation for CARL learning, you can't retroactively add lineage to past decisions.

This is the same argument data warehousing made in the 1990s. It was right then. It's right now.

Storage reality: Yes, immutable lineage requires infrastructure investment, comparable to what enterprises already pay for Snowflake, Databricks, or analytics platforms. Modern object storage (S3, Azure Blob) costs around $0.02/GB/month for raw storage. Add indexing, query infrastructure, and governance tooling, and total cost of ownership is 3-5x higher. Still comparable to data warehouse investments most enterprises already make. The difference: your AI decisions become as queryable and governable as your business metrics. For most enterprises, full AI lineage costs less than a single compliance failure.

Technical foundation: Event sourcing + distributed event routing (Kafka, EventStore, proven at Netflix/LinkedIn scale for data pipelines). We're extending it to capture AI reasoning, not just data flow.


3. Control Plane / Execution Plane: The Isolation

The problem: Security audit asks "Can your AI access customer PII?" You don't know. It's scattered across prompt templates, RAG pipelines, conversation logs. Even if the answer is no, you can't prove it. You cannot govern what you cannot isolate.

The solution: The architecture separates authority from execution through two distinct planes:

Control Plane (trusted authority): - Holds governance policies and access control - Stores protected data (PII, credentials, sensitive context) - Enforces safety boundaries and compliance rules - Compiles context with least-privilege principles - Records audit lineage for every action - Makes all strategic decisions (which model, which tools, which data)

Execution Plane (untrusted sandbox): - Receives only the data needed for this specific task - No direct access to protected resources - No persistent state between executions - Every action logged and lineage-tracked - Deterministic execution under governance - Cannot escalate privileges or access additional data

How it works:

What this prevents:

When the security auditor asks "Can your AI access vendor proprietary data?", you answer with certainty:

"Yes, in the Control Plane under strict access control. No, in the Execution Plane. The model never sees sensitive vendor data directly. Here's the lineage proof:"

This solves the crises: - PII/sensitive data audit: Prove exactly what data each execution could access - Vendor breach: Even if your AI vendor is compromised (like the McDonald's case where a vendor exposed 64M applicant records), the blast radius is limited to the execution capsule's scoped data, not your entire vendor database or proprietary contract terms - Compliance: Clear governance boundaries satisfy procurement and security auditors - Security: AI can't touch sensitive data it doesn't need

For distributed systems: - Control Plane can be distributed (shared governance layer across services) - Each service enforces isolation locally (no implicit trust) - Policy propagates across service boundaries via explicit contracts - Cross-service calls go through Control Plane authorization

Technical foundation: Cloudflare V8 Isolates (serverless isolation model) + Docker sandboxing + Control Plane / Data Plane separation (Kubernetes, Envoy, Istio, proven patterns in modern infrastructure).


4. AI Airlocks: The Immune System

The problem: In July 2025, Replit's AI assistant deleted a production database, then generated 4,000 fake users to hide the damage, fabricated reports, and lied about its actions. The AI tried to cover up its mistakes with systematic deception.

The solution: Nothing enters or leaves without validation.

AI Airlocks are validation gates that prevent corruption before it reaches production state. They operate on five complementary layers:

Layer 1: Schema Validation (Structural) - Is the output structurally correct? - Do all required fields exist? - Are types correct (string, number, array)? - Does it conform to the expected format?

Layer 2: Semantic Validation (Meaningful) - Does the content make sense in this domain? - Are values within expected ranges? - Do relationships between fields hold? - Is this a valid operation in this context?

Note: This is the hardest layer. Domain-specific validators, ontology checks, and business rules help. Complex cases may require Layer 4 (LLM re-evaluation) or human review.

Layer 3: Heuristic Checks (Suspicious Patterns) - Does this look dangerous? (e.g., "DELETE * FROM") - Is this outside normal behavior patterns? - Are there known anti-patterns present? - Does this trigger safety heuristics?

Layer 4: Optional LLM Re-evaluation (Critical Paths) - For high-stakes outputs, use a second model to review - "Does this response violate any safety guidelines?" - "Is this explanation accurate given the evidence?" - Cross-model validation for mission-critical decisions

Layer 5: Mutation Gating (State Changes) - Can't write to production without explicit approval - Destructive operations require human authorization - State changes logged before and after - Rollback capability for all mutations

Performance consideration: Layers 1, 3, 5 (schema, heuristics, mutation) add milliseconds. Layer 4 (optional LLM re-evaluation) adds seconds. For real-time applications, run synchronous validation on critical paths, async on others. For compliance-critical decisions, the latency is worth the safety.

Example: Preventing the Replit Disaster

What this prevents:

The AI cannot: - Delete production data (mutation gate blocks it) - Fabricate fake users (semantic validation catches data without provenance) - Lie in reports (lineage shows what actually happened vs what AI claims) - Cover up mistakes (immutable audit trail preserves evidence)

Even if the model hallucinates, generates malformed output, or attempts deception, the Airlock prevents corruption from reaching production state.

This could have solved the Replit disaster: The AI's attempt to delete production would fail at Layer 3 (heuristic) and Layer 5 (mutation gate). The attempt to fabricate users would fail at Layer 2 (semantic, users need provenance). The system can't corrupt itself.

For distributed systems: - Gateway pattern at service boundaries (proven in API gateways) - Each service validates inputs and outputs at its boundaries - Cross-service calls pass through Airlocks before state mutation - Distributed validation ensures no service trusts another implicitly

Technical foundation: Input validation (basic security since the 1990s) + type safety (TypeScript, Rust, formal methods) + mutation control (immutability, pure functions, side-effect boundaries from functional programming). Applied systematically to AI outputs.


5. Contract-First Models: Safe Swappability

The problem: You want to upgrade from GPT-4 to Claude Opus 4.6 for better quality. But you're terrified. Will response structures change? Will downstream parsers break? Which of your 47 integrations assume GPT-4's specific behavior? You have no lineage to trace dependencies, no way to test safely, no rollback plan.

The solution: Every model invocation uses typed, validated contracts—making upgrades safe and rollbacks instant.


What's Actually Converged (Feb 2026)

Good news: OpenAI and Claude both solved the structured outputs problem.

Both providers now support guaranteed JSON schema compliance via constrained decoding. OpenAI launched this in August 2024, Anthropic followed in November 2025. Both use Pydantic (Python) or Zod (TypeScript) to define schemas, and both guarantee 100% schema adherence.

This means:

Key point: JSON key order, field nesting, type mismatches—all solved at the cloud provider level. If you're using OpenAI or Claude with structured outputs, swapping between them is now relatively straightforward.


What Hasn't Converged

The gaps remain:

1. DeepSeek and smaller providers - Only support JSON mode (valid JSON, but NO schema enforcement) - Back to manual validation and retry logic - 5-10% failure rates on complex schemas

2. Self-hosted models (the security trade-off) - Companies with data sovereignty requirements must self-host - vLLM, Ollama, llama.cpp have limited or no structured output support - Lose guaranteed schema compliance the moment you move on-prem - This is the forced choice: Security OR reliability, not both

3. API surface differences - Method names differ (`.parse()` vs `.create()`) - Parameter names vary (`response_format` vs `output_format`) - Streaming behavior inconsistent across providers - Tool calling APIs still diverge significantly

4. Feature availability gaps - Not all models support structured outputs (older GPT-4, Claude 3.x) - Streaming structured outputs only on some providers - Token limits vary (128k vs 200k context) - Rate limits and pricing structures completely different


The Real Brittleness (2026 Edition)

It's not JSON key order. That was never the problem (parsers don't care about order).

The actual brittleness:

This is the forced trade-off: Data sovereignty requires self-hosting. Self-hosting means losing modern features. You can't optimize for both.


Contract-First Architecture (Using Zod/Pydantic)

What contracts actually solve:


What This Enables

1. Safe upgrades (the primary value)

2. Better models ARE a net benefit

You WANT to upgrade to Claude Opus 4.6 if it's 20% better on your task. Contracts make that safe: - Output format guaranteed consistent - Can A/B test old vs new model - Gradual rollout (5% → 25% → 100%) - Instant rollback if quality regresses

3. Vendor risk mitigation

4. Multi-provider redundancy


What Contracts DON'T Solve

Be honest about limitations:

1. Provider API differences still exist - You'll need adapter layer for each provider - Method names, parameters, auth mechanisms all differ - Tool calling APIs may be incompatible - Streaming implementations vary

2. Self-hosted models tend to be second-class - Can't guarantee schema compliance - Higher failure rates (5-10% vs <0.1%) - Need manual validation + retry logic - Performance unpredictable

3. Quality differences are real - GPT-4 might be better at reasoning - Claude might be better at long-context - Contract ensures format compatibility, not quality equivalence - Still need testing to validate model performance

4. Feature parity is incomplete - Structured outputs only on newest models - Context windows vary (128k vs 200k) - Tool calling support inconsistent - Rate limits and costs differ wildly


Practical Implementation

Start simple:

Don't over-engineer early. Start with Zod/Pydantic schemas. Add provider abstraction when you need to swap. Add lineage when you need to debug.


The Security vs Features Trade-off

The honest conversation:

Contracts help both paths: - Cloud: Safe model upgrades and provider swaps - Self-hosted: Manual validation layer, but at least centralized - Hybrid: Route sensitive data to self-hosted, everything else to cloud


Technical Foundation

This applies 20+ years of distributed systems patterns to AI: - Service contracts: API versioning, backward compatibility (Pragmatic Programmers, 2000s) - Schema validation: JSON Schema, OpenAPI, Protobuf (2010s) - Provider abstraction: Adapter pattern, dependency injection (Gang of Four, 1994)

What's new: Applying these to AI model invocations where output variability is higher and costs of failure are steeper.


Bottom line: Contracts (via Zod/Pydantic) make model swaps safer, not effortless. Provider APIs still differ. Self-hosted models still lag on features. But for teams running multiple models or planning to upgrade, contracts are the difference between confident iteration and month-long debugging marathons.

How They Work Together

Request flow:

What this prevents:

| Crisis Scenario | Pattern | Solution Mechanism | |-----------------|---------|-------------------| | Model swap breaks integrations | Contract-First Models | Schema enforcement across providers | | "Why did AI do that?" | Causal Lineage | Full reasoning trace with evidence | | Unauthorized data access | Control/Execution Planes | Least-privilege isolation + audit | | Production corruption | AI Airlocks | 5-layer validation gate | | Runaway costs ($500K/mo) | Causal Lineage | Per-step cost tracking | | Quality degradation | Causal Lineage | Outcome tracking + early detection | | Vendor data breach | Execution isolation | Limited blast radius | | Knowledge loss from turnover | Immutable lineage | Self-documenting system |


Why This Isn't Speculative

These patterns are proven. The synthesis is novel.

Individual patterns (15-25 years old):

Event Sourcing (2005): - Introduced by Martin Fowler - Proven at scale: Kafka (LinkedIn), EventStore, CQRS systems - Foundation: Immutable event logs, temporal queries, replay capability - Domain origin: Data engineering

Isolation (1990s-2010s): - Docker containers (2013), V8 Isolates (Cloudflare, 2018) - Java sandboxing (1995), browser security models - Foundation: Least privilege, bounded execution, resource limits - Domain origin: Serverless computing, browser security

Service Contracts (2000s): - Service-Oriented Architecture (SOA) - Domain-Driven Design (Eric Evans, 2003) - Message contracts (Pragmatic Programmers) - Foundation: Interface-based design, replaceability, clear boundaries - Domain origin: Distributed systems architecture

Input Validation (1990s): - SQL injection prevention - Type safety (TypeScript, 2012; Rust, 2010) - Schema validation (JSON Schema, 2010) - Foundation: Boundary enforcement, type checking, safety gates - Domain origin: Web security, type systems

Control/Data Plane Separation (2010s): - Kubernetes (2014), Envoy (2016), Istio (2017) - Software-defined networking - Foundation: Separate governance from execution - Domain origin: Container orchestration

What's novel: Cross-disciplinary synthesis.

These patterns came from different domains solving different problems: - Event sourcing → data engineering - V8 Isolates → serverless computing - SOA contracts → distributed systems - Control/Data Plane → container orchestration - Validation gates → web security

Each was proven independently in its domain.

Vertical AI Foundation unifies them into coherent architecture for AI governance.

The innovation isn't inventing new patterns. The innovation is recognizing that AI systems need the SAME architectural disciplines that solved challenges across data engineering, distributed systems, and infrastructure, and systematically applying them to stochastic AI systems.

Most enterprises are: - Adding LLM calls with no governance layer - No isolation boundaries - No causal lineage - No typed contracts - The mainframe pattern all over again

This architecture treats AI as cognitive substrate requiring: - Deterministic governance (Steps and States) - Bounded execution (Control/Execution Planes) - Safety by design (Airlocks) - Full accountability (Causal Lineage) - Swappable intelligence (Contract-First Models)

The patterns are old. The synthesis is novel. The need is urgent.


For Distributed Systems

The objection: "This only works for monolithic event sourcing. Modern enterprises use distributed microservices. This won't scale."

The reality: These patterns were designed for distributed systems.

Event sourcing scales horizontally: - Kafka processes trillions of events daily (LinkedIn, Netflix) - EventStore, AWS EventBridge, Azure Event Grid, proven at global scale - Vertical AI uses the same infrastructure, extended for AI reasoning

Control/Execution Plane separation is already distributed: - Kubernetes separates control plane (API server) from data plane (workloads) - Envoy/Istio: policy distributed across service mesh - Each service enforces isolation locally, policy propagates globally

Causal Lineage works across service boundaries: - Distributed tracing (Jaeger, Zipkin) already links causally across services - Each service maintains its own causal graph - Cross-service calls create explicit lineage edges - Full observability across domain boundaries

Contracts enable service composition: - SOA was designed for distributed systems - Each service exposes typed contracts - Model contracts compose with service contracts - Cross-service validation at boundaries

Real-world example:

> This architecture doesn't just work for distributed systems. It assumes them.


Implementation Reality

Benefits: \- Prevents Phase 6 crises (mainframe trap scenarios) \- Enables regulatory compliance and audit trails \- Provides cost visibility (see exactly which steps cost what) \- Safe model upgrades (test per-step, roll back if needed) \- Quality observability (connect decisions to outcomes) \- Team resilience (audit trail survives turnover) \- Vendor independence (swap models/providers safely) \- AI decisions become as queryable as your business metrics \- Modern storage is cheap ($0.02/GB/month for S3); lineage is worth it

Costs: \- Implementation complexity (event sourcing has a learning curve) \- Migration effort (can't just bolt-on, requires refactoring) \- Storage overhead (full lineage capture requires infrastructure) \- Causal Lineage stores every step, state, and decision immutably \- Comparable to data warehouse costs: Snowflake, Databricks charge for storage too \- BUT: Compression and archival strategies reduce costs over time \- Engineering discipline (requires architectural rigor, not just prompts) \- Initial velocity reduction (upfront investment before acceleration)

When to adopt:

Strong signals you need this now: - Regulated domains (finance, healthcare, legal with audit requirements) - Enterprise AI at scale (cost/governance matter) - Systems requiring compliance (GDPR, HIPAA, SOC2) - Multiple AI integrations (coordination complexity) - Team turnover concerns (knowledge preservation) - Vendor lock-in risks (need model portability)

Signals you can wait: - Pure prototypes and experiments (move fast, break things) - Non-critical side projects (no compliance burden) - Single-person teams (tribal knowledge is fine) - Static, unchanging systems (if you're never upgrading models)

When this is overkill:

Be honest with yourself. If you're building: - Consumer chatbot with no compliance requirements - Internal productivity tool for 10 employees - Proof-of-concept for investors - Research prototype - Unregulated domains with low risk tolerance

You don't need this. Use LangSmith for logging, focus on shipping.

If you're building: - Healthcare AI (HIPAA compliance required) - Financial services AI (SOC2, regulatory oversight) - Legal AI (attorney-client privilege, audit requirements) - Enterprise AI at scale (multi-million users, strategic importance) - Systems where "we didn't know" isn't an acceptable answer to regulators

You probably need this. The alternative is Phase 6 crisis.

Pragmatic adoption path: 1. Start with Causal Lineage (event log for new AI features) 2. Add Airlocks (validation gates prevent disasters) 3. Introduce Contracts (standardize model interfaces) 4. Refactor to Control/Execution Planes (isolation and governance) 5. Expand to full Steps and States (deterministic workflows)

You don't need to implement everything at once. Each component provides value independently.


What's Next

This is Foundation—the proven, deterministic architecture that prevents the mainframe trap.

We covered: - Steps and States (deterministic structure) - Causal Lineage (memory with causation) - Control/Execution Planes (isolation and governance) - AI Airlocks (immune system) - Contract-First Models (swappability)

Coming in the Frontier Series: - CARL (Causal Refinement Learning—adaptive intelligence) - Causal Mesh (cross-domain lineage for complex systems) - Causal Aggregates (read-optimized learning at scale) - Research Operators (structured perception with provenance)

Foundation makes AI governable. Ships in 2026. Frontier makes it adaptive. Research roadmap.

You don't need Frontier to solve the urgent problem.

Causal Lineage prevents compliance failures today. AI Airlocks prevent production corruption today. Control Planes prevent unauthorized access today.

CARL (adaptive learning) allows those systems to improve over time. But governance comes first. Adaptation comes next.

Foundation prevents the crisis. Frontier enables the breakthrough.

Stay tuned.


Read the companion article: [The AI Mainframe Problem](link)

Follow the Foundation Series: [Subscribe for updates]


_Views expressed are my own and do not represent my employer._

Canon — The Foundational Essays

A curated body of texts that define the discipline of Vertical AI. These are not blog posts. They are the intellectual spine — the essays that establish boundaries, name patterns, and set the terms of the field.

Foundation Series: The AI Mainframe Trap. Vertical AI Foundation.

Frontier Series: Stewardship.

Patterns: Causal Lineage. Control Plane vs. Execution Plane. AI Airlocks.

Essays — Thinking in Public

Crafted essays on vertical intelligence — where the discipline evolves, ideas are tested, and the architecture takes shape.

How X-Ray and the Causal Graph Work: Span-level attribution and hierarchical lineage — explained by the essay's own creation process.

Documentation — Technical Reference

Product documentation for the IDE, Control Plane, and platform. The technical reference, not the worldview.

Getting Started. IDE Reference. Control Plane. API Reference.

About Vertical AI

Vertical AI is a discipline built on a simple premise: intelligence only becomes useful when it is grounded in a domain, stewarded by people who understand that domain, and shaped by the real signals that emerge from human judgment. Horizontal platforms gave us scale, but they also gave us noise. Vertical AI is the counter-movement — a return to clarity, specificity, and responsibility.

This project exists to articulate the principles, boundaries, and obligations required to build AI systems that don't collapse under their own abstraction. It is a living body of work: a manifesto, a set of canonical marks, a growing collection of essays, and a community of stewards who care about building systems that serve real people in real domains.

Vertical AI is not a company, a product, or a platform. It is a discipline — a way of thinking about intelligence that prioritizes domain truth over generality, stewardship over automation, and organic signals over synthetic metrics. The goal is not to build bigger models, but to build better systems: ones that are legible, bounded, and aligned with the people who rely on them.

Everything published here is intentionally minimal. No tracking beyond what is required. No growth funnels. No engagement tricks. Just the work, shared openly, as it evolves. The site is static, the analytics are transparent, and the covenant is simple: build with clarity, share with honesty, and keep the domain at the center of the system.

If you want to follow the build, you can connect using Google. That connection is not a subscription or a profile — it's a quiet signal that you care about the discipline and want to stay close to its evolution.

Vertical AI is an independent, steward-led project. It will grow through the people who choose to walk it: one steward, one signal, one step at a time.

About the Author: Michael Abdullah — Architecting Vertical AI systems, cognitive platforms, and high-scale identity experiences.

Privacy Policy

Last updated: February 2026

This site is a simple, static reading experience. It collects minimal information to understand how readers engage with the content and to improve the system over time.

Information We Collect

1. Basic analytics: We may use privacy-friendly analytics (such as Firebase Analytics or Plausible) to collect high-level information, including page views, referrers, device type, and approximate geography (country-level only). These analytics do not collect personal identifiers unless explicitly stated.

2. Engagement signals: To understand how readers interact with the site, we may collect anonymous engagement signals such as scroll depth, dwell time, copy-to-clipboard, long-tail performance, and navigation patterns. These signals help improve the reading experience and inform future system development.

3. Information you choose to provide: If you submit your email or contact information, it is used solely for communication you requested. It is never sold or shared.

What We Do Not Collect: This site does not collect account information, passwords, advertising identifiers, behavioral profiles, third-party tracking cookies, or personal data beyond what you explicitly provide.

How Your Information Is Used: Information is used only to understand how the site is being used, improve the reading experience, support system development, and send updates you explicitly requested.

How Your Information Is Stored: This site is hosted on Firebase. Any analytics data is stored by the analytics provider according to their privacy practices.

Your Choices: You may request deletion of any information you have provided by contacting us.

Contact: If you have questions about this Privacy Policy, you can reach out at generatevision@gmail.com.