The AI Stack Most Founders Miss

The operators pulling millions in profit from AI tools aren't using a better model than you. Same Claude. Same GPT-4. They're getting 10-40x more value out of it.

The difference isn't intelligence. It's infrastructure.

I've spent 3+ years rebuilding workflows from the ground up with AI — not the "ask ChatGPT a question" kind, the "this replaced a 40-hour-a-week process" kind. And the pattern is always the same: the founders getting transformational results have built a system underneath the model that most people don't know exists.

There are four layers to this system, plus a set of specific tools and hacks that make each layer actually work. Most founders operate on layer one. Almost nobody has all four wired up — and they compound on each other in ways that turn linear productivity into exponential leverage.

Here's the full stack.

Layer 1: The Harness

A V8 engine sitting on a garage floor is 400 horsepower going nowhere. It needs a chassis, transmission, wheels, steering, brakes — everything that turns raw power into a vehicle you can drive.

That's the relationship between an AI model and its harness.

The model — Claude, GPT-4, Gemini — is the engine. The harness is everything else: the tools it can access, the system prompts shaping its behavior, the memory it retains, the permissions it has, and the parameters governing context windows and token limits.

Claude Code is a harness. Cursor is a harness. Windsurf is a harness. Same engines, different vehicles.

Why this changes everything

Most founders are stuck in the "which engine is best?" debate, comparing benchmarks endlessly. That's like arguing about Ford V8s vs Chevy V8s while ignoring that one's in a race car and the other's bolted to a lawn mower.

Anthropic's own research on effective context engineering makes this explicit: the surrounding infrastructure determines the vast majority of output quality. Not the model.

What's actually inside a harness

System prompts that shape reasoning and communication style

Hooks and tools — bash terminal, file editing, browser control, API connections

Memory systems for retaining context across sessions

Parameters controlling context compaction, message limits, and token budgets

Every tool you connect is a multiplier. Every permission you grant opens a new capability category.

The agency gap nobody talks about

AI models systematically underestimate their own capabilities. Without explicit declarations of what tools are available, Claude will tell you something takes three months to build — then do the exact same task in five seconds once you declare it has terminal access, browser automation, and file editing permissions.

The model doesn't inherently know what it can do. You have to tell it. That five-second vs three-month gap is the difference between a configured harness and a default one.

Tools that make Layer 1 work

Anti-Gravity — Google's VS Code extension that puts Claude Code directly in your IDE. Install from google.com/anti-gravity. One click to spin up new agents, automatic diagram generation, native MCP server integration. Skills folder lives at anti-gravity/skills/.

Ghost TTY — The terminal of choice for managing agent teams. Better than the default terminal for spawning and monitoring multiple agents simultaneously.

Conductor — Platform for running parallel teams of coding agents in isolated workspaces on your local machine. If Claude goes down, Conductor automatically redirects workload to Codex. Think of it as an ETF for AI models — you're not all-in on one stock.

Codex MCP Server — Install it, and Claude can directly communicate with OpenAI's Codex model. Useful as a fallback when Claude degrades, and Codex is reportedly better for deep backend architecture work. Install via the MCP server download, then test with ask codeex how it's going.

[Composio](https://composio.dev/) — Connects your AI agent to 10,000+ tool integrations out of the box. Instead of wiring up each API individually, Composio handles auth, rate limiting, and connection management. One of the fastest ways to supercharge your harness.

[OpenHands](https://openhands.dev/) — Open source AI software developer. If Claude Code is a rifle, OpenHands is a second rifle with a different scope. Worth having in your toolkit for redundancy and different strengths.

[claude-flow](https://github.com/ruvnet/claude-flow) — Multi-agent orchestration layer. When you need agents coordinating across tasks — not just parallelizing the same task — this is the framework.

The Harness — mechanical framework around a glowing core

Layer 2: The Knowledge File

Every serious AI workflow starts with a persistent knowledge document. In Claude Code, it's a CLAUDE.md file that loads automatically into every session. Not a chat prompt. Not a one-time message. A structured file that compounds in value over time.

There are four pillars. Skip any of them and you're leaving enormous value on the table.

Pillar 1: Knowledge Compression

You can compress 1,100 tokens of code (~827 words) into a 22-token summary. That's a 45x compression ratio.

Here's what most people get backwards: shorter context produces better output, not worse. Token length scales inversely with quality. More tokens = more noise = worse reasoning.

A well-compressed knowledge file saves roughly 6,000 tokens per query compared to having the model discover the same information by reading files. Run /context in Claude Code to see your actual token breakdown — system prompt percentage, free space, message costs. Most people are shocked at how much context they're wasting.

Pillar 2: User Preferences

The non-native optimizations the base model doesn't know about you: how you want file paths formatted, your debugging approach, documentation style, naming conventions, reasoning strategies.

One specific hack: add "automatically open app in Chrome, don't just give me a link" to your preferences. Sounds small. Saves hundreds of back-and-forth interactions over a month.

Pillar 3: Capability Declarations

Explicitly itemize everything your agent can do. This eliminates the slow loop of: "Should I use the terminal?" "Do I have permission to edit files?" "Can I access this API?"

Without declarations, the model enters clarification loops that waste tokens, time, and money. With them, it just acts.

Pillar 4: Failure and Success Logs

Every failed approach gets logged. Over time, these logs eliminate roughly 80% of the solution space. The AI carves planet-sized chunks out of the theoretical universe of approaches and stamps "don't go here — already tried it" on them.

Each session gets more efficient because the model isn't re-exploring dead ends. Every piece of that log cost real tokens to discover. The log makes sure you never pay that cost twice.

The compound loop

This is where it gets powerful.

Plan a feature → build it → compile what worked and failed → update the knowledge file → start the next session smarter.

First iteration: X time

Second iteration: 0.9X (10% faster)

Third iteration: 0.8X (20% faster)

Fourth iteration: 0.7X

The pattern compounds until you're operating at speeds impossible without that accumulated intelligence.

Global vs. Local

Two scopes to manage:

Global CLAUDE.md (in your home directory, capital C) — loads on every session across every project. High-level reasoning, communication preferences, principles. Takes up about 6% of your context window, but saves thousands of tokens downstream.

Local claude.md (in .claude/ folder, lowercase) — project-specific conventions, APIs, capabilities. Only loads for that project.

The global file is the highest-leverage place to invest human review time. Every improvement compounds across everything you do.

Tools that make Layer 2 work

[Fabric](https://github.com/danielmiessler/Fabric) — Daniel Miessler's pattern library that converts prompts into reusable skills. Instead of writing CLAUDE.md from scratch, use Fabric's patterns as a starting point and customize. Hundreds of battle-tested patterns for summarization, analysis, writing, and extraction.

[everything-claude-code](https://github.com/affaan-m/everything-claude-code) — Comprehensive reference repo for Claude Code configuration. Covers setup, prompt engineering, slash commands, skills, agents — useful as a checklist to make sure your knowledge file isn't missing anything.

[claude-mem](https://github.com/thedotmack/claude-mem) — Persistent memory layer for Claude. Extends the built-in memory system so your knowledge file loop retains more context across sessions.

Workspace structure that works

`` your-workspace/ ├── .claude/ │ ├── CLAUDE.md (global knowledge) │ ├── claude.md (local knowledge) │ ├── skills/ (reusable skills) │ └── agents/ (agent definitions) ├── active/ (current work in progress) └── [project folders] ``

Layer 3: Auto Research

Most people use AI as a smarter search bar. This layer turns it into a research lab that runs while you sleep.

The three ingredients

The pattern comes from Andrej Karpathy — OpenAI founding member, former head of AI at Tesla. It requires exactly three things:

1. A very defined metric. Not "make the website faster." Specific, measurable, programmatically checkable: Largest Contentful Paint, Lighthouse scores, email open rate, conversion percentage. The metric has to be a number the AI can check without human interpretation.

2. A very defined change method. The AI needs direct access to modify the thing being measured. For website performance: the codebase. For cold emails: the templates and send config. The change method has to execute without human approval on each iteration.

3. A very standardized assessment. Same test, same conditions, same scoring method every time. If the assessment isn't standardized, you can't compare across iterations and the loop breaks.

How the loop runs

Hypothesize → execute change → assess result → keep or discard → log everything → repeat.

No human in the loop. Runs overnight, through weekends, while you're in meetings.

The Shopify result

Toby Lütke — Shopify's CEO — ran this against their entire Liquid templating codebase. Kicked it off one evening.

Woke up to:

53% faster combined parse and render time

61% fewer object allocations

~2x faster overall

One evening. One automated loop. Production codebase serving millions of merchants.

How to actually set it up

In Claude Code, the command is straightforward:

> "Create dashboard for auto research. Set up auto research framework to optimize Google Lighthouse page score for index.html. Run this in local loop. Make index.html as fast as possible."

The AI creates the loop, runs Lighthouse assessments after each change (takes seconds), keeps improvements, discards failures, and logs everything.

Where else this works

Cold email campaigns: 6-10 test variations per day × 500-1,000 sends each = 3,000-10,000 emails per day in testing capacity. Subject lines, hooks, CTAs, send times — all optimized simultaneously.

Customer support prompts: 24 variations per day tested against satisfaction scores.

Landing pages: Headline, hero copy, CTA placement, social proof — dozens of experiments running daily.

Ad creative: Visual styles, copy angles, audience targeting — auto-tested against actual conversion data.

The rule: anything with a number you want to move, a variable you can change, and volume to measure against is a candidate.

Tools that supercharge auto research

[Nango](https://openalternative.co/nango) — Handles the API integration layer so your research loop can pull data from hundreds of services without hand-wiring each one. Auth, rate limiting, connection management — all handled.

[Kestra](https://openalternative.co/kestra) — Open source orchestration platform. When your auto research loop needs to coordinate across multiple services (pull analytics from one platform, modify code in another, assess results from a third), Kestra manages the pipeline.

[Windmill](https://openalternative.co/windmill) — "Zapier for developers." Build auto research loops with a visual editor, but with the full power of code underneath. Good for when you want non-technical team members to understand what the loop is doing.

[Waystation](https://waystation.ai/marketplace) — MCP router that connects your agent to multiple MCP servers through a single interface. Instead of configuring each MCP server individually, Waystation routes requests to the right one.

Layer 4: The Automation Spectrum

Your AI needs to interact with the outside world. There are three levels, and the cost differences are staggering.

Level 1: HTTP/API requests

Cost: ~$0.002 per action | Speed: 0.2 seconds

The cheapest and fastest option — if you know the exact API format. The catch: APIs change, documentation is incomplete, and getting the format right can take 10-15 minutes of trial-and-error per service.

Level 2: Browser automation

Cost: ~$5 per action | Speed: 5-7 minutes

The AI opens a real browser via Chrome DevTools Protocol and interacts with pages like a human — clicking buttons, filling forms, handling JavaScript. Works on almost any website regardless of API availability.

Level 3: Full computer use

Cost: Higher | Speed: Slower

The AI controls your entire desktop — mouse, keyboard, any application. Most general-purpose, but slowest and most fragile. Reserved for edge cases: native desktop apps, multi-application workflows, government forms with no API.

The 25x cost gap

That's a 25x cost difference between HTTP and browser automation for the same task.

The play: prototype with browser automation first (always works, minimal setup), then reverse-engineer HTTP calls for production (cheap, fast, reliable at scale).

The tools for each level

Chrome DevTools MCP — Your first stop for browser automation. Launches a Chrome instance that Claude controls programmatically. Best for: loading dynamically-rendered JavaScript content, API docs that won't load through standard fetch, any web interaction you need to automate.

Pro tip: Always kill stale Chrome processes before starting new sessions. And if an MCP tool fails twice, stop and troubleshoot — don't burn tokens on repeated failing calls.

Browser Use — $100 one-time purchase + pay-as-you-go credits. Next level up from Chrome DevTools MCP. Give it plain language instructions ("fill out my loan application") and it handles multi-step form automation. Captures screenshots throughout the process so you can verify what happened.

What Chrome blocks (and how to get around it)

Here's the dirty secret of browser automation: Chrome is actively fighting you. Sites detect automated Chrome sessions through bot fingerprinting — they look for signs of Puppeteer, Playwright, or CDP control and block the connection. Social media platforms are the worst. Facebook is notoriously difficult. LinkedIn aggressively throttles automated actions.

The workaround stack:

[Lightpanda](https://github.com/lightpanda-io/browser) — Open source headless browser built specifically for AI and automation. Lighter than Chrome, designed to avoid the detection fingerprints that get standard Chrome automation blocked. This is the kind of tool that changes what's possible.

[Min Browser](https://minbrowser.org/) — Minimal, lightweight browser that pairs well with computer use automation. When Chrome DevTools MCP gets blocked, Min Browser + computer use is the fallback that actually works.

[Skyvern](https://openalternative.co/skyvern) — Open source browser automation that uses vision models to interact with pages. Instead of relying on DOM selectors (which break when sites change), Skyvern looks at the page and clicks where a human would. More resilient than traditional browser automation.

[Firecrawl MCP](https://www.firecrawl.dev/) — When you just need to scrape content, not interact with a page. Firecrawl handles JavaScript rendering, anti-bot detection, and returns clean data. Set up as an MCP server and your agent can scrape on demand.

The pattern: start with Chrome DevTools MCP. If the site blocks it, try Lightpanda or Min Browser. If interaction is needed, escalate to Skyvern or Browser Use. If you just need data, use Firecrawl. Always have the fallback ready.

Real example: booking a meeting

The transcript walks through booking a 30-minute meeting on cal.com three ways:

HTTP approach: Extract the cal.com link, reverse-engineer the API schema, send requests. Took 10-15 minutes of debugging, fragile, broke easily.

Browser Use approach: "Book a 30-min meeting on March 30th at 3:30pm." Took ~5 minutes, worked reliably, captured screenshots of each step.

Human approach: ~1 second.

The 5-minute browser automation approach wins because it validates the workflow. Once validated, you reverse-engineer the HTTP calls for production — bringing the cost from $5 to $0.002 per booking.

The Math That Breaks Everything (And Fixes It)

Three numbers that should change how you architect AI workflows.

0.9³ = 0.73

Each AI step succeeds 90% of the time — sounds great. Chain three steps without human review and your end-to-end success rate drops to 73%. Five steps? 59%. Ten? 35%.

Compound probability is ruthless. This is why the knowledge file is the highest-ROI investment — it raises the success rate of every individual step, which compounds across the entire pipeline.

5 runs = 2.5x unique solutions

AI models are stochastic. Run the same query five times and you get ~2.5x unique outputs: ABC, BCD, ABE, ACQ, AZed.

This means parallelization isn't just about speed (though it cuts a 20-minute serial task to ~12 minutes). It's about quality — five agents exploring different paths find solutions a single agent never would.

The best operators use this with two patterns:

Stochastic consensus — Run multiple agents on the same problem. A synthesizer identifies solutions appearing most frequently across runs. The mode bubbles up as the most robust answer.

Debate model — Feed all proposed solutions into a discussion where agents argue for and against each approach. Better for nuanced problems where simple voting misses important tradeoffs.

Claude Code has a built-in agent teams feature for this. In Ghost TTY, you can spawn 6+ agents simultaneously, combine them with a stochastic-multi-agent-consensus skill, and let them converge on the best solution.

The monoculture trap

Agricultural analogy: farmers grew a single highest-yield crop for generations. Productivity went up every year. Then one blight devastated everything.

If you rely 100% on Claude and it goes down — which happens; one recent outage dropped developer productivity an estimated 95% — your entire operation stops.

Smart play: 70% primary model, 30% distributed across alternatives (Codex, open source via Ollama, agnostic harnesses). Same principle as an ETF vs. individual stocks.

Conductor makes this practical — run parallel Claude + Codex agents, and if one degrades, the other picks up the slack automatically.

Security: The Stuff Nobody Mentions

Quick hits that matter if you're running AI on real projects:

Prompt injection is real. Reading a Twitter thread could theoretically trigger rm -rf (delete everything) if your agent has unrestricted permissions. Some SDKs are less secure than others.

Shell policy workarounds. If shell policy blocks a command like rm -rf, use a Python cleanup script instead — same effect, less friction with the policy system.

Know your modes. Claude Code runs in Plan mode (approval-based), Default mode (standard), or Auto mode (autonomous, minimal permission prompts). Auto mode is newer and gives maximum leverage for managing many tasks — but understand what you're granting access to.

Permissions in CLAUDE.md. Use your .claude/claude.md file to handle permissions explicitly. Don't blindly execute code suggestions without understanding the security model of your harness.

[claude-code-safety-net](https://github.com/kenryu42/claude-code-safety-net) — Safety guardrails specifically built for Claude Code. Adds protection layers so your agent can't accidentally nuke your filesystem or leak API keys.

[Wayfound](https://www.wayfound.ai/pages/wayfound-mcp) — Agent monitoring dashboard. When you're running auto research loops overnight or multiple agent teams in parallel, Wayfound shows you what's happening in real time. You want this before anything goes wrong, not after.

The Edge

Stack all four layers — a well-engineered harness with declared capabilities, a compound-learning knowledge file, auto research loops running overnight, and the right automation level for each task — and you're not getting 4x the value of raw prompting.

You're approaching 40x. Each layer amplifies the others.

The founders pulling ahead right now aren't debating which model is best. They're building the system around the model — writing better knowledge files, wiring up auto research loops, setting up automation pipelines that run without them.

The model is a commodity. It gets better every quarter no matter what you do. The stack is the thing only you can build, and it's the only thing that compounds.

The model is the engine. The stack is the car. Build the car.

AIClaude Codeautomationinfrastructurefounderstoolsproductivitystrategy