
AI has radically changed Quality Assurance, breaking old inefficient ways of test automation, promising huge leaps in speed and the ability to test things we otherwise couldn’t easily test before.
But getting AI to be trusted by QA teams is a major challenge across the industry. A big part of the difficulty stems from the “black box” nature of AI tools where the internal processing is abstracted away. That makes people worry about the accuracy and reliability of its outputs for ensuring critical software quality.
There’s also genuine and understandable concern among testers about how their roles will change with these tools. And so, building real trust with AI in QA isn’t something that happens automatically. Instead, it requires deliberate strategy and action. Mayank Bhola, Co-founder and Head of Product at LambdaTest, an AI-Native software testing platform, emphasizes that achieving team trust and measurable ROI with AI hinges on making smart, upfront choices about tools and diligently tackling integration challenges.
Focusing on comprehensive evaluation criteria and proactively navigating the implementation roadblocks are essential foundational steps for any organization serious about leveraging AI effectively. Notable engineering leaders share their insights on what really matters when evaluating AI tools for your QA technical stack and provide insights on significant hurdles organizations must clear to build lasting trust and achieve tangible ROI.
What really matters (and choosing the right AI tools)
Picking the right AI tools for your QA team is critical for building trust and delivering measurable value. This requires examining criteria beyond the basic feature list vendors show you and focusing on technical criteria that prove the tool is trustworthy and capable in real workflows.
Trust fundamentally links back to understanding how the AI is doing what it does. Bhola calls for transparency and explainability in “simple languages” with “traceable outputs… and auditability.”
This means tools need robust logging, tracing features, and clear audit trails built in so testers can technically debug and verify the AI’s steps and data flow, just like they would with any other complex system component. Testers must also understand the why behind an AI decision, not just the outcome, enabling them to provide corrective feedback and bridge the gap caused by the AI’s inherent ‘black box’ nature.
Beyond explainability, technical transparency also involves consistency. Andy Piper, Vice President of Engineering at Diffblue, points out that LLMs are weak on transparency and consistency because they’re probabilistic models. His company opts for deterministic AI models that “further reinforce trust among developers” by reliably giving the same technical output for the same technical input. Evaluating how consistent and predictable an AI tool’s technical behavior should be a key part of the assessment, particularly when comparing different AI approaches.
Security remains non-negotiable. According to Sunil Senan, SVP and Global Head – Data, Analytics & AI at Infosys, any AI tool must follow technical industry best practices and standards, including addressing critical vulnerabilities such as those outlined in OWASP Top 10 and properly securing underlying infrastructure like Kubernetes environments. He emphasizes that meeting enterprise-grade standards (SOC2, ISO, GDPR) is a baseline, especially for regulated sectors.
Tools also need technical configurations to prevent the AI from accidentally accessing sensitive data you have on your network or premises. Furthermore, organizations need to retain ownership and confidentiality of the input data sent to the tool and the output received back, which requires clear technical data handling policies from the vendor.
Beyond trust considerations, organizations need to check whether the tool delivers genuine technical value. Bhola advises prioritizing AI Agents that act as a “strategic assistant”—analyzing past test results to build future strategies for holistic quality assurance, rather than just generating test scripts.
These AI Agents must address “real problems in the business”, not merely function well in controlled conditions, recommending testing tools using actual production issues or complex real-world scenarios. Karan Ratra, senior engineering leader at Walmart, suggests being as critical in evaluating AI as a human tester, recommending “side-by-side experiments across different LLMs” to see how they handle diverse or malformed technical inputs.
This technical scrutiny helps uncover a tool’s true capabilities outside of ideal conditions. Integration flexibility is also important; the tool needs to fit smoothly into CI/CD pipelines and other tools using standard connectors.
Thinking about ROI is crucial, and it covers numerous areas of improvement, not just time savings.
Senan of Infosys highlights evaluating impact on test coverage, risk prioritization, and defect detection as “Value Beyond Speed.” Ultimately, there has to be a clear ROI and operational fit. Companies shouldn’t get AI “for the sake of implementing AI” but because it’s “really helping to improve the operational excellence or improve the cost metrics.
Checking if the vendor is trustworthy and offers good support is very important. You need confidence in their security practices and ability to help you integrate and troubleshoot.
So, thoroughly evaluating tools based on technical transparency, security standards, verifiable capabilities, integration flexibility, and potential ROI measured broadly is key. Making smart choices builds trust in the technical solutions you deploy.
Navigating AI implementation hurdles
Even when teams understand the ethics and see the potential, putting AI into practice in QA presents some challenges. These roadblocks make building trust harder and can significantly undermine expected return on investment.
One major issue is simply unrealistic expectations. Hugo Farinha, Co-founder at Virtuoso QA, identifies “unrealistic expectations” as a primary barrier. It’s difficult to separate fact from fiction because many AI tools over-promise technical capabilities given their immaturity.
Proving AI’s financial value in technical terms is another significant challenge. Many teams lack hard metrics on creation time, maintenance effort, and defect leakage, which makes showing ROI attribution difficult. Merrell reasons that “nobody knows what the real ROI… will be at this point” because the market is young. And it’s important at some point also to factor in the tester’s paradox— verifying AI output makes the tool look less productive, pressuring teams into structural irresponsibility.
Dealing with technical complexities and data governance issues is another substantial obstacle. Data governance—Ensuring test data stays private, anonymized, and encrypted while feeding AI models—is a key technical challenge.
Senan notes that “Governance Gaps” can derail AI initiatives if not intentional from the start.
Another factor: AI abstraction can be tough to interpret, hindering trust in decisions related to model explainability. Guardrail calibration—balancing AI autonomy versus manual control—is also a technical configuration challenge.
Integrating AI into existing pipelines also presents difficulties.
Pipeline integration—embedding Gen AI into existing CI/CD without disrupting release cadence—is one challenge. Siloed AI integration across teams also hinders technical adoption into the broader landscape, notes Farinha.
There is also the technical difficulty and added cost when AI is used to test something probabilistic, meaning results aren’t always the same and require redundant checks. Also, AI brings a large attack surface that cannot be defended, with DevSecOps implications.
This attack surface is due to AI features being vulnerable to issues like prompt injection, supply chain vulnerabilities, and insecure plugins, meaning advanced AI is needed to test these new attack vectors.
How companies manage themselves and their people contributes significantly. Change management—shifting QA roles from repetitive scripting to AI‑assisted oversight—requires clear upskilling plans. Senan notes that “Fear of Displacement” is a challenge organizations counter by positioning Gen AI as a force multiplier, and “Change Management” is therefore critical for effective adoption.
Navigating the vendor landscape populated with immature tools also presents ongoing challenges. Bhola recommends, “Selecting vendors that offer true transparency into their AI practices and a clear roadmap for addressing technical debt in their models is a significant challenge for many organizations.” It requires careful due diligence beyond marketing.
It’s also tough balancing automation vs. accountability when automated checks miss something critical. Senan adds that embedding Responsible AI and Agentic AI principles helps organizations navigate these challenges by integrating autonomous agents within governed environments.
While these roadblocks shouldn’t deter AI adoption, they represent crucial technical and organizational hurdles requiring deliberate strategy, careful planning, and organizational willingness to adapt QA operations and success metrics to AI realities.
The path to successful AI testing
So, building trust in AI-driven QA definitely isn’t a simple thing.
Getting AI to work successfully and deliver real value requires more than technology adoption alone. Industry leaders emphasize that trust must be deliberately cultivated while addressing both technical and human considerations throughout the process.
It begins with strategic tool selection. “You have to focus on technical criteria like transparency features, adherence to security standards, real-world capability, and integration flexibility,” says Ratra of Walmart. Evaluating AI based on what really matters builds the foundation for reliability and confidence.
And it means being ready to face the real problems that come up during implementation. This includes managing expectations, getting the right technical metrics for ROI, navigating data governance and integration hurdles, and handling the organizational changes that shift roles and responsibilities.
“These roadblocks need strategic navigation, not avoidance,” explains Bhola, drawing from his engineering experience launching Kane AI, a native GenAI test agent developed by his team at LambdaTest.
Ultimately, AI in QA represents a collaborative partnership between technology and human expertise. AI handles repetitive tasks, finds patterns. But humans bring critical thinking, judgment, and strategic context that remain indispensable.
When organizations commit to careful evaluation and strategic challenge navigation, they unlock AI’s potential.
Bhola, exposits “When done right, focusing on building trust and addressing challenges results in AI solutions that not only improve technical efficiency but also elevate the overall quality and speed of software delivery.”
The goal of this exercise is to empower testers, streamline processes, and deliver reliable software with confidence.