using Claude Code: how AI can transform your approach to technical debt

Jerridan Quiring

Staff Software Engineer

AI is evolving fast. And when you’re building the infrastructure behind it, your engineering standards need to move just as quickly.

At Ada, we were facing a common challenge: thousands of lines of legacy code needed refactoring, with virtually no test coverage to make that refactoring safe.

This legacy code was holding back our development velocity. We knew that modernizing it would significantly improve our codebase, and AI tools could handle much of the conversion work effectively. But there was one major roadblock: how do you safely refactor code when there’s no way to verify that your changes won't introduce regressions?

The best practice is to write comprehensive tests first, but when you're staring down thousands of lines of code, that could mean weeks of work before any real progress begins. It's a frustrating cycle: the testing barrier makes refactoring feel too risky, so technical debt continues to build.

That's when I considered a different approach: what if AI could help us build the safety harness we needed?

the hypothesis: use AI to generate tests and de-risk refactoring

I started by testing several tools: DevinAI, GitHub Copilot, and ChatGPT. But Claude Code , Anthropic’s command-line AI assistant for coding, stood out. Its control and iterative refinement features made it the best fit for this challenge.

My hypothesis was simple: if Claude Code could generate usable tests quickly, we could safely refactor legacy code at scale. In our case, that meant converting legacy Redux code from reducer/action patterns to modern Redux Toolkit slices with full test coverage.

If successful, this approach could transform how we handle large refactoring projects, turning weeks of manual test writing into a much more manageable process.

first attempts: promising structure, unusable output

To set a baseline, I gave Claude Code a small React component with minimal instruction—no context about our testing patterns or conventions. The results highlighted the challenge ahead.

While Claude Code demonstrated a solid understanding of what functionality needed testing and wrote reasonable test descriptions, the generated tests were completely unusable:

Incorrect file naming convention
Failed to use our custom render method from test helpers
Poor element querying strategies that would break with UI changes
Completely ignored TypeScript issues
Used assertion methods incompatible with our setup
Didn't verify that the tests actually passed before considering them complete

Fixing the tests would’ve taken longer than writing them from scratch. But instead of quitting, I treated the output as valuable data on what to do next.

GARAGe: how ada uses LLMs to make retrieval smarter

shifting gears: systematic instruction and iteration

I approached the problem like any other software issue: break it down, learn from failure, and iterate.

I created a CLAUDE.md instruction file with specific instructions about our testing patterns and philosophy. After each failed attempt, I would:

Identify exactly what Claude Code did incorrectly
Add specific instructions to prevent that mistake
Generate tests again with the updated guidance
Repeat until the quality improved

This process became quite educational. Not just for Claude Code, but for myself and the team, too. I had to articulate testing practices that I'd internalized over years of experience, making our team's implicit knowledge explicit.

from negative to positive to specific instruction

One of the most persistent issues was Claude Code's tendency to mock extensively. Here's an early instruction I added:

Do not mock internal repository code: Always use the real implementations for code defined in this repo. Mocking local code can hide important interactions and reduce test reliability.
Do not mock Redux, RTK Query, or other data fetching libraries: Instead, rely on MSW (Mock Service Worker) and real store configurations.

But the mocking continued. I learned that AI tools like Claude Code work much better when you give them positive instructions about what they should do in specific scenarios, rather than just telling them what they shouldn't do.

specific instructions: using MSW for API mocking

But positive instructions alone still weren't enough. While I could tell Claude to "use MSW handlers and real store configurations," it still struggled without concrete examples of which specific helpers to use or how to structure the test setup.

The turning point came when I realized I needed to be extremely specific about our project's tooling and conventions, down to exact file paths and code snippets. Here's how the mocking instructions evolved:

Organize handlers in a central mocks directory:

Store HTTP handlers in mocks/handlers.ts
Store any mock data structures (like in-memory DB objects) in mocks/database.ts (http://database.ts)
When a component calls useClient, use the createClient helper from test/helpers/createStore.ts to create a mock client object

Use this recommended MSW setup snippet:

It wasn’t enough to say “use MSW." Claude needed the file structure, boilerplate, and conventions. This taught me that successful AI collaboration depends on making implicit team knowledge explicit.

The instruction file grew significantly over time, but each addition addressed a failure and made the tool stronger

the breakthrough: systematic guidance pays off

After several iterations, I tested Claude Code on a similar component.

The transformation was remarkable. What used to take 2-3 hours now took 15-20 minutes.

The generated tests were functional, achieved over 80% code coverage, and followed all of our conventions.

The tests used our render helpers, handled async operations correctly, focused on user behavior—not implementation details—and were robust against UI changes.

Claude Code even resolved ESLint and TypeScript issues and verified that tests passed before moving on.

This wasn’t just a better output—it was a reusable system. And I realized the method could extend far beyond our Redux migration.

scaling testing expertise and expanding AI utility

The biggest benefit wasn’t speed, it was scale.

Claude Code became a teaching tool. Engineers with less testing experience could now write high-quality tests and learn our best practices in the process.

Instead of asking senior developers to explain MSW setups or querying strategies one-on-one, we encoded that knowledge in a way Claude—and our team—could apply repeatedly.

With Claude, we’re not just writing tests faster, we’re raising the floor on test quality across the team.

It didn’t stop there. Since implementing this approach, I've found Claude Code invaluable for much more than just writing new tests. It's become my go-to tool for:

Fixing flaky tests: You know those mysterious test failures that work perfectly on your machine but fail randomly in CI? Claude Code is surprisingly good at diagnosing and fixing these issues.
Updating tests after refactoring: When component APIs change during refactoring, Claude Code can quickly adapt the entire test suite to match.
Creating comprehensive edge case coverage: It's particularly good at thinking through error states and boundary conditions that humans might forget to test.

It’s not just a code generator—it’s an extension of our engineering brain. The investment we made in teaching it our way of working continues to pay off in time saved, quality gained, and knowledge shared.

how we train large language models for Ada’s AI agent

try this with your team

If you want to try this approach with your own team, here are the key steps to get started:

Start with a baseline and expect failure. Don't be discouraged when the first attempt fails completely. Use those failures as valuable data points for what needs to be taught.
Make implicit knowledge explicit. Your team's "obvious" conventions—file naming, helper usage, assertion methods—need to be documented in detail. AI tools need concrete examples of your specific setup, not just general principles.
Provide positive, concrete instruction. Instead of "don't mock Redux," specify exactly how to implement the alternative: "use MSW handlers and real store configurations, organized in mocks/handlers.ts with this specific setup code." AI needs detailed examples, not just general guidance.
Think of it as teaching, not prompting. You're systematically building the AI's understanding of your team's approach. The initial time investment in crafting good instructions pays dividends across your entire team.

transforming how we approach technical debt

The most powerful AI applications aren't about replacing human developers, they're about amplifying our capabilities and spreading expertise across the team.

By investing time upfront in teaching Claude Code our testing standards, we found a tool that helps experienced developers move faster and helps less experienced developers learn better practices.

When the cost of building a safety net drops from weeks to days, entire categories of tech debt become tackleable. Risky refactors become routine. And your AI tools stop being novelties—they start becoming teammates.

Claude helped us break through our testing barrier. Not because it was brilliant out of the box, but because we taught it how to be useful.

That’s the real promise of AI in software engineering—not to replace humans, but to multiply our impact.

build an extraordinary career

Join Ada at the forefront of AI customer service, as we build truly disruptive technology for some of the biggest brands in the world.

See careers

How to build a world-class AI customer service team