Me, Myself and AI: 5 Days, 2 Models, 200M Tokens

Some thoughts after using Claude 4.5 and GPT-5 Codex across Cursor and Augment Code

By
Eric Unger
October 21, 2025
7 min read

01

Blog content test

Blog content test

The Experiment That Changed Everything

I just spent 5 days building with AI coding assistants—not just tinkering around the edges, but going from an empty repository to a fully working platform. This was my first time truly partnering with AI to ship something real, and it fundamentally changed how I think about software development.

The setup was deliberately designed to leverage the strengths of multiple models while keeping them accountable to each other. Two cutting-edge models, two powerful coding tools, and a workflow that forced them to challenge each other's assumptions. What emerged was faster than I expected, more robust than either model alone could achieve, and full of insights about how AI-assisted development actually works in practice.

This isn't a story about AI replacing developers. It's about amplifying what we can build when we architect the right collaboration between human judgment and machine capability.

5 Days, Concept to working system | 200M Tokens, Iterative conversations | 138 Commits, Small, focused changes

The Multi-Model Workflow

The key to making this work wasn't just using AI—it was using AI strategically. Instead of relying on a single model, I built a workflow that put different models in different roles, then forced them to review each other's work.

01. Write Specs First

Before writing a single line of code, I created 14 detailed markdown documents covering API contracts, database schemas, and complete workflows. These specs became the shared context that kept both models aligned and honest.

02. Build with Multiple Models

I alternated between Cursor with GPT-5 Codex for initial scaffolding and feature builds, then switched to Claude 4.5 for refactoring and rapid iteration. Augment Code with Claude 4.5 handled code review and deep debugging sessions.

03. Review Loop

One model builds a feature, then I switch tools and have the other model review it. When I hit a bug, I get a fix from Model A, then ask Model B to critique the approach. When models disagree, it forces me to actually understand the tradeoffs.

Tool Strengths: Finding the Right Model for Each Job

Augment Code

Best for: Deep debugging and spec adherence
When something wasn't working and I needed to trace through complex logic, Augment Code excelled. It followed my specifications with impressive precision, making it invaluable for implementation work.

Claude 4.5

Best for: Fast iteration and diagnosis
Noticeably faster than Claude 4. When I needed to quickly track down issues or refactor code, Claude 4.5's speed made a huge difference in maintaining momentum throughout the day.

GPT-5 Codex

Best for: Complex implementations
For intricate features requiring sophisticated logic, Codex consistently delivered. It kept up well with Claude's speed while handling architectural complexity.

Performance Note: Augment Code sometimes slowed down dramatically, especially running the full stack locally in Docker. Worth investigating the memory footprint of VSCode with Augment vs Cursor. Cursor seemed to maintain speed even as the repo and context grew larger.

Patterns That Actually Worked

Specs as Shared Context

Having detailed specifications that both models could reference prevented drift and hallucinations. The specs served triple duty: requirements for the AI, documentation for me, and a source of truth when models disagreed.

Every major feature started with a markdown doc. API contracts defined exact endpoints and payloads. Database schemas mapped relationships. Workflow documents described user journeys step by step.

Productive Disagreement

Having models disagree caught more bugs than either alone. When GPT-5 Codex suggested one approach and Claude 4.5 recommended another, I was forced to understand the actual tradeoffs rather than blindly accepting AI output.

These moments of disagreement became learning opportunities. I'd dig into why each model made its recommendation, often discovering edge cases or performance implications I hadn't considered.

Iterative Review Cycles

Small, focused commits meant each change could be reviewed and validated quickly. The 138 commits over 5 days averaged about 28 per day—each one a discrete, understandable unit of work.

This granular approach made it easy to roll back when something didn't work and made the AI-generated code much more maintainable.

By the Numbers: What 200M Tokens Actually Built

Two Coding Tools Working in Tandem

Cursor and Augment Code each brought different strengths to the table. Rather than picking one, I used both strategically—Cursor for its consistent performance and speed, Augment for its deep debugging capabilities and spec adherence.

The ability to switch between tools meant I could choose the right environment for each type of task, making the overall workflow more efficient.

The (Maybe Over-Engineered) Stack

I'll be honest—I might have gone overboard with the architecture. But the AI models handled it surprisingly well, and the comprehensive specs kept everything aligned with the original design.

Backend Services

  • FastAPI → REST API and async processing
  • CrewAI → Multi-agent orchestration (5 agents)
  • PostgreSQL → Structured data storage
  • Neo4j → Graph relationships
  • Redis → Caching and task queues

Frontend Stack

  • React + TypeScript → Component architecture
  • Tailwind CSS → Utility-first styling

Infrastructure

  • Docker + Docker Compose → Containerized deployment
  • Celery → Background task processing

Could I have simplified this? Probably. But the models handled the complexity fairly well, and the specs helped A LOT to keep everything aligned with what I designed.

What I Learned About AI-Assisted Development


Engineering Still Matters

The tools are incredibly powerful, but you absolutely need structure. AI doesn't replace architectural thinking—it amplifies it. You need to know when the AI is wrong, understand the tradeoffs between different approaches, and maintain a clear vision of what you're building.

The moments when models disagreed were often the most valuable, because they forced me to dig deeper and truly understand the implications of each choice.

Architecture First
AI can implement your vision, but you need to provide that vision. The 14 spec documents weren't busywork—they were the foundation that made everything else possible.

Review Everything
Never blindly accept AI-generated code. The review loop where one model critiques another's work caught subtle bugs and improved code quality significantly.

Speed Matters
Claude 4.5's speed improvement over Claude 4 wasn't just a nice-to-have—it fundamentally changed the workflow. Faster iteration meant more experiments, quicker fixes, and better end results.

The Honest Truth: Challenges and Trade-offs

Performance Mysteries

Running the full stack locally in Docker, Augment Code would sometimes slow down dramatically. Whether this was a memory footprint issue with VSCode or something else isn't clear, but it's worth investigating before scaling this workflow.

Cursor maintained consistent speed even as the repository and context grew, which made it more reliable for sustained development sessions.

Context Management

With 200M tokens of back-and-forth, managing context became crucial. The spec documents helped, but there were still moments where I had to explicitly remind the models of earlier decisions or architectural constraints.

Over-Engineering Risk

It's easy to get carried away when AI can implement complex systems quickly. I'm still iterating on whether all this complexity was necessary or whether simpler solutions would have worked just as well.

The multi-agent orchestration with CrewAI, for example, is powerful—but did I need 5 agents? I'm cleaning this up for production to find out.

Bottom Line: Me, Myself, and AI Did a Great Job


With the right workflow, you can move incredibly fast. Five days from concept to working system isn't just impressive—it's a glimpse of what software development looks like when humans and AI collaborate effectively.
The key insights that made this work:

  • Specs keep models honest and aligned
  • Multiple models catch more issues than either alone
  • Claude 4.5's speed is a huge improvement
  • Engineering judgment is more important than ever
  • The right tools for the right tasks matter


What's Next?

I'm iterating on agent management patterns as I prepare this for production. The core system works, but there's refinement needed—simplifying where possible, optimizing for performance, and documenting the patterns that emerged from this experiment.

The future of development isn't AI replacing developers. It's developers who know how to orchestrate AI effectively replacing those who don't. This experiment proved that with structure, multiple models, and thoughtful workflow design, the productivity gains are real and substantial.

Overall... me, myself, and AI did a great job. And we're just getting started.