Can you treat test cases like source code that compiles into automation?
I wondered if this might work… so I tried it.
How far can I take building a fully “prompt driven” test automation system?
In previous experiments, I’ve been exploring different aspects of AI-driven test automation:
This experiment brings it all together into a systematic approach I’m calling The Test Automation Compiler.
The core idea: Treat markdown test cases as source code that gets compiled into executable automation through a defined process – just like a programming language compiler turns source code into machine code.
I don’t honestly know if this is a good approach yet. Intuition just tells me it’s worth trying!
That last one is crucial – I took everything learned from Experiments #2 and #3, and a document I developed on determinitic AI test automation. I then refined this with Claude Code’s help, and built a comprehensive strategy document that defines the entire compilation philosophy and process.
Here’s how the Test Automation Compiler works:
The Core Philosophy:
Traditional test automation requires translating human test cases into code – a manual, error-prone process that creates two artifacts. Two artifacts that need maintaining in parrallel. The Test Automation Compiler approach treats this as a compilation problem instead. Your markdown test case is the source code. An AI discovery run acts as the compiler, learning implementation details through intelligent exploration. The generated YAML is the compiled output – optimized, deterministic and ready to execute. Not ready to execute in the traditional sense. Ready to execute with an AI coding engine and an MCP connection to a browser.
This shift in approach could have profound implications. Just like a programming language compiler converts high-level code into machine instructions, this system converts human-readable test cases into a script that can be run. Kind of similar to modern compilers where they include optimization passes, our new learning phase extracts the most efficient patterns from discovery. The feedback loop acts like profiling tools, identifying where the compiled code needs refinement. The result: a single source of truth (your markdown test case) that stays in sync with automation through systematic recompilation.
The Three Pillars:
Automation-Aware Test Case Creation
Not all test cases are created equal. A test case written as “Click the blue button” is human-readable but nightmarish to automate. A test case written as “CLICK button labeled ‘Submit'” is both human-readable AND automatable. This approach ensures test cases are written in a structured format that humans can read naturally while AI can parse reliably. Kind of similar to BDD but not as prescriptive. It’s an automation-aware format that is “compiler-friendly” test documentation.
Run Twice Pattern (from Experiment #3)
The first run is exploratory – the AI coder with MCP DevTools connection tries multiple approaches, measures actual timing, discovers the most reliable selectors, and logs everything. Maybe a bit like a compiler’s analysis phase, understanding the structure before generating code. The second run validates that what was learned actually works deterministically. This two-phase approach separates the intelligent discovery (which can be non-deterministic) from the production execution (which must be deterministic). You get the benefits of AI exploration without the unreliability of agentic execution in production.
Intelligent Feedback Loop
Over time, the system learns from failures and successes. When a selector breaks, the feedback loop updates the YAML without touching the markdown test case. When business logic changes, the markdown is updated and the YAML recompiled. This feedback loop identifies and undertakes maintenance for your automated tests. The system gets smarter with every execution, learning which patterns work and which need adjustment.
That’s the idea … I still have a bit to work out and finish off here. Watch out for Experiment #5.
The Four Commands:
I built four Claude Code slash commands that implement the compilation pipeline:
/discover – Discovery execution (Run 1)/learn – Extract automation patterns/generate – Create YAML from learnings/validate – Validation execution (Run 2)I took a markdown test case through this complete 4-step compilation process.
What I asked:
What happened:
Claude Code took the markdown test case and executed it using Chrome DevTools MCP. But this wasn’t just a simple run – it was a discovery session designed to learn everything about automating this test:
Discovery captured:
The output: discovery-log.json – a comprehensive record of everything learned during the first run.
This discovery run is analyzing the application’s structure, parsing the UI patterns, measuring the real-world behavior. It’s not just blindly executing steps; it’s building an internal model of how this application works so it can generate optimal automation instructions. The discovery log is a structured representation of everything needed for the yaml script generation.
The key insight here is that this first run is intelligent exploration, not just execution.
What I asked:
What happened:
Claude Code analyzed the discovery log and extracted automation patterns specific to this application:
Learnings extracted:
The output: learnings.json – distilled intelligence ready for YAML generation.
This learning extraction step does is looking to optimise our test automation process specifically for our test case and our application. It looks at all the attempted selectors and picks the most reliable ones. It analyzes timing patterns and calculates optimal wait strategies. It identifies application-specific behaviors (like a 600ms modal animation) that need special handling. This is more than just data aggregation – it’s using AI intelligent pattern recognition that extracts reusable automation knowledge from raw execution data. The learnings become the “optimisation rules” that guide YAML generation (the next step).
What I asked:
What happened:
Claude Code combined:
To generate a deterministic YAML specification.
The YAML included:
The output: test-spec.yaml – the “compiled” version of the markdown test case.
This is the yaml script generation phase – where high-level requirements (markdown) and implementation intelligence (learnings) combine to produce a script that can be followed by AI coding tools. The YAML specification includes everything needed for deterministic execution: precise selectors with confidence scores, optimized wait times based on measured behavior, proper sequencing learned from discovery, and even fallback strategies for unreliable elements. It’s structured, readable, and maintainable – just like well-written code. But unlike hand-written automation this is like a “compiled” output based on actual observed behavior.
The YAML reads like hand-crafted automation code, but it was generated entirely from the discovery process. It’s deterministic, optimized, and includes all the learned implementation details.
What I asked:
What happened:
Claude Code executed the generated YAML specification to validate that the compilation was successful.
Validation performed:
Validation Results:
| Dimension | Score | Notes |
|---|---|---|
| Reliability | 10/10 | All steps execute consistently |
| Timing | 10/10 | Optimal waits discovered |
| Data Handling | 10/10 | Correct transformations |
| Determinism | 10/10 | No randomness in execution |
| Maintainability | 10/10 | Well-structured, documented |
Overall Assessment: Production-ready, fully deployable to CI/CD
Maintenance Notes: Selectors need monitoring for UI changes (which is true for any automation).
The validation step completes the compilation cycle by proving the generated YAML actually works. It’s comparing results against the discovery run, checking for deterministic behavior, confirming there’s no randomness in execution. The scoring system provides objective metrics across multiple dimensions, giving you confidence the automation is production-ready. This isn’t just a binary pass/fail – it’s a comprehensive quality assessment that identifies potential maintenance points before they become problems. The 10/10 scores across all dimensions mean the compilation was successful: the YAML faithfully represents the markdown test case with optimal implementation.
After completing the 4-step compilation workflow:
Works well for:
Might not work quite so well for:
Surprises:
⚡ Quick Verdict:
🟡 AMBER: “Interesting . . . an approach that’s taking shape! Need to work in the feedback and learning layer!”
The Good:
The Concerns:
Would I use this?
Maybe – the approach is solid and the 4-step workflow makes intuitive sense. I need to:
I think the foundation we have here is strong enough to convince me to invest more time in this approach.
The “Test Automation Compiler” concept isn’t just a metaphor – it’s a working proof of concept.
Treating test cases as source code that compiles into automation provides several benefits:
The three pillars work together:
I think the concept and foundation is solid. From markdown to production-ready automation with a systematic compilation process and no code written.
This experiment demonstrates that the Test Automation Compiler approach isn’t just an interesting idea – it’s a practical, working proof of concept. The compiler metaphor proved to be more than just a convenient analogy; it’s a useful description of what’s happening. We’re taking human-readable test documentation (source code) and systematically transforming it through analysis, optimization, and code generation phases into deterministic, production-ready automation (executable machine code). When I say “executable machine code” I really mean a script that’s reliably execuatable by an AI coding engine with an MCP conncetion to a browser.
The four-command workflow provides a clear, repeatable process. A process that separates concerns: discovery for learning, extraction for optimization, generation for compilation, and validation for quality assurance.
What makes this approach fundamentally different from traditional test automation is the elimination of parallel artifacts. There’s no separate test documentation that falls out of sync with test code. There’s no manual translation step where implementation details get lost or misinterpreted. Although it could be argued that losing this translation step means you’re missing a human review and analysis step that traditionally would find issues – both in the test case and the application under test.
The markdown test case becomes THE documentation, and the YAML specification is automatically compiled from observed behavior rather than assumed implementation. When the application changes, you update the markdown and recompile – just like updating source code and rebuilding. When implementation details change (selectors, timing), the feedback loop updates the YAML without touching your documentation (at least that’s what I’m hoping once this stage is implemented).
This single source of truth approach, combined with intelligent compilation and continuous learning, represents a genuinely interesting way to think about test automation maintenance and sustainability.