LLM-assisted QA for Alteryx

A normal task — with modern tools

I want to showcase a task anyone in process work will recognize.

There is also a fair amount of process discussion. That's intentional. Complexity matters if we want to understand what modern LLMs actually enable. There is no magic wand here—this is a look behind the curtain, and a push to solve hard problems rather than settle for novelty.

The assignment

I was asked to perform QA on a coworker's Alteryx solution. It was a big one: a three-workflow process designed intentionally to require manual intervention at key stages.

The designer walked me through it in about an hour. Solid walkthrough—but it was a heavy volume of context. No one realistically retains that much detail in one sitting.

I understand the mechanics of these processes deeply. I've reviewed hundreds. But I hadn't executed this process, and some accounting nuances live with the business owners.

My role in QA isn't to second-guess business rules. My role is to evaluate structure, assumptions, failure modes, and risk.

So the real question became:

Can AI help perform high-quality QA when the process context exceeds what a human can comfortably hold in memory?

LLM-assisted QA: narrowing the search space

This is where experimentation began.

I used our firm-approved LLM to consume the packaged workflow—including inputs and dependencies. Instead of forcing myself to memorize everything, I let the model do what it does well: hold context.

I first prompted it to understand the workflow. Then I asked two questions derived from our formal QA documentation. It returned eight potential issues.

Six were edge cases with minimal business risk. Useful, but not design-changing. Two mattered. That's where this approach started paying off.

Technical risk uncovered: the Python/OpenXML issue

The workflow used a Python tool to ingest an OpenXML Excel workbook—the correct design choice given native limitations. The LLM noticed something subtle: the script blindly consumed the first worksheet in the workbook.

In theory, the file arrives untouched. In reality, a human downloads and saves it. Worksheets get reordered, renamed, or accidentally modified.

Because the workflow and script were already in context, I collaborated with the LLM to redesign the code:

validate worksheet structure
confirm expected layout
throw warnings and errors before ingestion

Yes—the code was tested.

Discovery → remediation → implementation: under one hour. This closed a blind spot where incorrect data could be ingested silently.

Business-logic risk: the 1.5-mile geocode rule

The second issue wasn't technical. It was the business rule itself.

The process compares property locations between systems. If geocoded addresses fall within 1.5 miles, they are treated as the same site. Anything beyond that goes to research.

Reasonable rule—on its face. But the LLM surfaced the risk: dense urban areas can contain multiple distinct properties within 1.5 miles. That can drive incorrect overrides and downstream regional reporting errors.

Examples surfaced quickly (Kansas City, Portland, New York, Philadelphia, Arlington). In these areas, 7,920 feet can mean a materially different region.

Still, scrapping a valuable process over a small subset of edge cases didn't feel responsible. So I reframed the problem:

Where does this rule meaningfully increase risk?

From insight to an auditable process

I opened a fresh context window and focused only on the 1.5-mile population. After removing PII, I worked with the LLM to design an address-normalization and classification approach:

similarity scoring
urban vs. rural differentiation
risk categorization

Records were bucketed into false-positive risk, false-negative risk, ambiguous, and likely safe.

Out of several dozen properties: ~80% likely safe and ~20% required attention.

That transforms a noisy list into a focused review an accountant can complete quickly and confidently.

Constraints drive better design

Two realities shaped the final solution:

we cannot embed LLMs directly in Alteryx
models change rapidly

So we chose a durable approach: markdown.

A version-controlled markdown file contains a surgical, model-agnostic prompt. It's auditable, updateable, and plays nicely with change management. Process owners attach the Excel output and the markdown file to an approved LLM interface, paste a short SOP instruction, and receive curated results in about 90 seconds.

Because the process lived in the context window, we also documented it—in the business team's language, not a technical manual.

In one business day we:

evaluated a complex multi-workflow process
performed targeted QA
remediated technical risk
identified hidden business-logic risk
implemented repeatable mitigation
documented the process

This was my first time using this approach. It raises a bigger question:

Why wait until QA? Why not surface these risks during development?

Keep playing at the frontier. There is no textbook at the frontier's edge. You experiment, you make mistakes, and you see what holds up. This feels like something real—a shift in how we evaluate and design processes.

"Your future is whatever you make it—so make it a good one."

LLM-assisted QA for Alteryx: from walkthrough to auditable mitigation