Lab note
LLM-assisted QA for Alteryx: from walkthrough to auditable mitigation
A narrative summary of the QA experiment—tightened for scan-ability, with the same controls-first mindset: assumptions, failure modes, and risk.
Lab note
A narrative summary of the QA experiment—tightened for scan-ability, with the same controls-first mindset: assumptions, failure modes, and risk.
I want to showcase a task anyone in process work will recognize.
Disclosure: everything shown here uses synthetic data and pseudocode. Nothing proprietary.
There is also a fair amount of process discussion. That’s intentional. Complexity matters if we want to understand what modern LLMs actually enable. There is no magic wand here—this is a look behind the curtain, and a push to solve hard problems rather than settle for novelty.
I was asked to perform QA on a coworker’s Alteryx solution. It was a big one: a three-workflow process designed intentionally to require manual intervention at key stages.
The designer walked me through it in about an hour. Solid walkthrough—but it was a heavy volume of context. No one realistically retains that much detail in one sitting.
I understand the mechanics of these processes deeply. I’ve reviewed hundreds. But I hadn’t executed this process, and some accounting nuances live with the business owners.
My role in QA isn’t to second-guess business rules.
My role is to evaluate structure, assumptions, failure modes, and risk.
So the real question became:
Can AI help perform high-quality QA when the process context exceeds what a human can comfortably hold in memory?
This is where experimentation began.
I used our firm-approved LLM to consume the packaged workflow—including inputs and dependencies. Instead of forcing myself to memorize everything, I let the model do what it does well: hold context.
I first prompted it to understand the workflow. Then I asked two questions derived from our formal QA documentation. It returned eight potential issues.
Six were edge cases with minimal business risk. Useful, but not design-changing. Two mattered. That’s where this approach started paying off.
The workflow used a Python tool to ingest an OpenXML Excel workbook—the correct design choice given native limitations. The LLM noticed something subtle: the script blindly consumed the first worksheet in the workbook.
In theory, the file arrives untouched. In reality, a human downloads and saves it. Worksheets get reordered, renamed, or accidentally modified.
Because the workflow and script were already in context, I collaborated with the LLM to redesign the code:
Yes—the code was tested.
Discovery → remediation → implementation: under one hour.
This closed a blind spot where incorrect data could be ingested silently.
The second issue wasn’t technical. It was the business rule itself.
The process compares property locations between systems. If geocoded addresses fall within 1.5 miles, they are treated as the same site. Anything beyond that goes to research.
Reasonable rule—on its face. But the LLM surfaced the risk: dense urban areas can contain multiple distinct properties within 1.5 miles. That can drive incorrect overrides and downstream regional reporting errors.
Examples surfaced quickly (Kansas City, Portland, New York, Philadelphia, Arlington). In these areas, 7,920 feet can mean a materially different region.
Still, scrapping a valuable process over a small subset of edge cases didn’t feel responsible. So I reframed the problem:
Where does this rule meaningfully increase risk?
I opened a fresh context window and focused only on the 1.5-mile population. After removing PII, I worked with the LLM to design an address-normalization and classification approach:
Records were bucketed into false-positive risk, false-negative risk, ambiguous, and likely safe.
Out of several dozen properties: ~80% likely safe and ~20% required attention.
That transforms a noisy list into a focused review an accountant can complete quickly and confidently.
Two realities shaped the final solution:
So we chose a durable approach: markdown.
A version-controlled markdown file contains a surgical, model-agnostic prompt. It’s auditable, updateable, and plays nicely with change management. Process owners attach the Excel output and the markdown file to an approved LLM interface, paste a short SOP instruction, and receive curated results in about 90 seconds.
Because the process lived in the context window, we also documented it—in the business team’s language, not a technical manual.
This was my first time using this approach. It raises a bigger question:
Why wait until QA? Why not surface these risks during development?
Keep playing at the frontier.
There is no textbook at the frontier’s edge. You experiment, you make mistakes, and you see what holds up. This feels like something real—a shift in how we evaluate and design processes.
“Your future is whatever you make it—so make it a good one.”
Happy to share how I structure prompts, validation checkpoints, and handoff artifacts so the result holds up under real operational pressure.