AI Experiment #2: Test Case to Automated Execution with Chrome DevTools MCP

October 15, 2025

Can you actually go straight from test case doc to automated execution?

Thought I’d give this a go and see what kind of results I came up with.


The Question

“Can I just give Claude Code my test case document, connect the Chrome DevTools MCP, and have it run the tests? Like, actually run them in a real browser, without writing any test code or setting up a framework?”

Potential scenarios for this include

  1. when you have documented test cases but no automation yet – could this give you instant automated execution?

  2. Maybe you could use this to figure out how to interact with difficult applications before building proper test frameworks.

Anyway, no point speculating about how to use this if it doesn’t work. We just need to know if it does work!


What I’m Using

  • Chrome DevTools MCP – Model Context Protocol for Chrome browser automation
  • Claude Code – AI coding assistant with MCP support
  • Demo financial dashboard – Test application
  • Test case from Experiment #1 – Markdown-formatted test case I’d previously created

I was curious whether the Chrome DevTools MCP could bridge the gap between human-readable test documentation and actual automated execution.


The Setup

Here’s how I got this working:

Installation:
First you need Node and NPM installed. Then install the Chrome DevTools MCP:

npm install -g chrome-devtools-mcp@latest

I had ffmpeg already installed (it comes with a Playwright install), but you’ll need it for screenshots.

Configuration:
Add the Chrome DevTools MCP to Claude Code’s configuration. You can do this via command line:

claude mcp add chrome-devtools npx chrome-devtools-mcp@latest

This updates your Claude Code configuration file to include the MCP integration.

Verification:
Start Claude Code and check that Claude can see the MCP using:

/mcp

You should see chrome-devtools listed and connected. When I ran this, Claude actually fired up an initial Chrome instance just to verify it had access.

Test Case:
I used the test case created in Experiment #1 – a markdown file with structured test steps for adding an investment account. I added the application URL to the preconditions so Claude would know where to navigate.

Starting Point:
So my starting point was a fresh Claude Code session with the test case file and Chrome DevTools MCP configured.


The Experiment

Try #1: Initial Prompt

I primed Claude Code:

“You are an expert test automation engineer. Our goal is to take a test case documented in markdown and run it directly as an automated test using the Chrome DevTools MCP.”

Then I gave it the path to my test case file.

What happened:
Claude Code reviewed the test case, created a todo list to track execution, and opened Chrome automatically. It navigated straight to the dashboard page.

Then it clicked the “Add Account” button.

That’s pretty amazing. We’ve gone from a standard human-readable test case directly to automating it in a browser with no framework, no code – just directly via the Chrome DevTools MCP.

I’m genuinely surprised. It followed every step in my test documentation – clicked buttons, filled forms, validated results. All from my markdown test case.

It actually caught specific details – the account name I specified, the exact description text, even the modal form behavior and validation messages.

That’s pretty impressive!

Try #2: Verify Consistency

I ran the test again to see if it would work consistently.

What happened:
Second run was actually quicker. Claude had context from the first run and executed more efficiently.

What I learned:
The approach is repeatable. Once Claude understands your test case structure, subsequent runs are faster and smoother.

Try #3: Token Usage Check

I wanted to understand the cost implications. Using a tool called CC Usage to monitor token consumption:

Result:
Each test run consumed about 1-2% of my session token allocation. That means roughly 50-100 test runs per session maximum.

Observation:
Not free, but not prohibitive either. For exploratory testing or figuring out automation approaches, this token cost is worth it. For CI/CD pipeline runs with hundreds of tests daily, you’d want a more deterministic coded approach.


Patterns I Noticed

After testing this approach, some patterns emerged:

Works well for:

  • Initial automation of documented test cases
  • Exploratory test automation – figuring out how to interact with an application
  • Applications where you’re unsure of the automation approach
  • Generating automation insights before building test frameworks
  • Quick verification of test cases
  • Learning how to interact with complex UI components

Gets messy with:

  • Long exploratory sessions (with no clear objective)
  • Tests requiring exact timing control
  • High-volume test execution (token consumption)
  • Production CI/CD pipelines (need deterministic frameworks)
  • Tests with two-factor authentication or complex login flows

Surprises:

  • It actually followed test steps reliably
  • Chrome DevTools MCP has comprehensive browser control (list console messages, network requests, screenshots, drag, hover, fill forms, click, navigate)
  • The test execution logs were detailed and useful
  • You could potentially use all the telemetry from Chrome DevTools to generate Playwright scripts for more deterministic automation
  • Claude suggested creating a slash command to make this repeatable
  • It even started suggesting ways to make the approach more deterministic (YAML test case format)

The Honest Take

⚡ Quick Verdict:
🟢 I’d use this tomorrow

Would I use this?
Absolutely. Not for everything, but definitely for specific scenarios.

For what?

  • Working out how to interact with difficult applications before building test automation
  • Initial automation of existing test cases to understand feasibility
  • Exploratory automation where you need quick feedback
  • Generating insights for building proper test frameworks
  • Applications with complex UI where you’re unsure of the automation approach

When would I NOT?

  • Production CI/CD pipelines (needs deterministic frameworks)
  • High-volume test execution (token consumption)

Token usage will add up with frequent runs, but you could test with lower-spec models to reduce costs. The real value is using this approach to figure out automation strategies, then converting to deterministic Playwright or similar frameworks for production use.


Still Curious About

What I’m still curious about and want to test further

  • How consistent will this be over repeated runs? (It’s totally agentic – different models might give different results)
  • Will lower-spec models work as reliably with lower token costs?
  • Can I use all the Chrome DevTools telemetry to automatically generate Playwright scripts for deterministic automation?
  • Will this work with something like AGgrid or other complex component libraries? (That’ll be interesting!)
  • How would it handle applications with two-factor authentication or complex security?
  • Then I’m thinking ….. might this just work in a CI/CD pipeline?

The Prompts I Actually Used

If you’re interested in trying this, these are the exact prompts I used:

Please review the files in this project.

Please check that you have access to the chrome devTools mcp

Please tell me which commands you can use with the chrome devTools mcp

You are an expert test automation engineer

Our goal is to take a test case documented in markdown and run it directly as an automated test using the chrome DevTools mcp

[path to test case markdown file]

please guide me on how to take the lessons learnt from this session and create a claude code slash command

Resources

Want to try this yourself?

Really was simple to get setup once you have the Chrome DevTools MCP installed and configured. Let me know what happens when you try it – I’m especially curious about how it handles complex component libraries like AGgrid or applications with difficult authentication flows.