Skip content

Introducing AI Powered Penetration Testing

31 March – 32 minutes  

In this episode, we are joined by Chris Oakley (SVP Assurance Services, Americas, LRQA), Dave Parsons (Managing Principal Security Consultant, LRQA) and David Greene (Strategic Partnerships, Simbian) to introduce AI Powered Penetration Testing, a new capability from LRQA developed with Simbian. Together, they explore what AIPT is, how it works in practice, and where it fits alongside consultant-led penetration testing. 

The conversation covers why security testing programmes are under pressure to move faster, how AI can support more frequent and repeatable testing, and why credibility, governance and expert oversight still matter. It also addresses practical concerns around trust, data handling, sovereignty, retention and safe operation. 

Follow us on Spotify

LRQA: The Future in Focus

Josh Flanagan: 
Hello everyone, and thanks to all of our listeners worldwide. Welcome to LRQA’s Future in Focus podcast. My name is Josh Flanagan, and today we’re launching a new solution from LRQA, AI Powered Penetration Testing. 

I’m joined by Chris Oakley, SVP of Assurance Services at LRQA, Dave Parsons, Managing Principal Security Consultant at LRQA, and David Greene, Strategic Partnerships at Simbian, who has worked with us on building this capability. Chris, Dave, David, thank you very much for joining us. 

Today, we’re unpacking what AI Powered Penetration Testing actually means in practice, how it works alongside consultant-led testing, and why continuous testing is becoming essential. We’ll also cover the governance and data controls that sit behind it. 

So, diving straight in, I’m going to come to you, Chris. To set the scene, what is AI Powered Penetration Testing, and why is LRQA launching it now? 

Chris Oakley: 
Sure thing, Josh. The core problem we kept hearing from clients was that their environments are changing faster than their testing programmes could keep up. They might have a pen test once a year, maybe twice a year, but their attack surface is constantly evolving, with things like new cloud infrastructure, new integrations and new code deployments. 

Our AI Powered Penetration Testing is our answer to that gap. It combines Simbian’s AI-driven testing capabilities with our own LRQA expert oversight, and what that lets us do is deliver consistent and repeatable security testing at a pace and frequency that traditional consulting alone just cannot really match. 

We’re not replacing consultants. We’re extending what they can do and what is possible. The timing made sense right now because the technology is genuinely mature enough to produce something credible. The last year or two has seen immense growth in this space, so we’re now able to provide actionable outputs. At the same time, client demand for more frequent assurance has reached the point where we needed a real solution, not just a concept. 

Josh Flanagan: 
Perfect, thanks Chris. You mentioned Simbian there. Could you give us a short breakdown of the partnership and why we started to work with Simbian on this solution? 

Chris Oakley: 
Absolutely. We had a few options, transparently. We could build something ourselves, we could buy something off the shelf, or we could partner. When we assessed the market, there just was not really anything at the time that we felt comfortable using. 

While we certainly could have built something ourselves, and we have a team of expert hackers who can build all sorts of interesting proofs of concept, there is a leap from that to providing enterprise-grade software. Along the journey, we found Simbian. 

Simbian already had a product in the detection and response world, and they were really keen to build something with us in the offensive security world. Through a series of conversations and technology demonstrations, we were really blown away by what we saw, and we knew almost immediately that they were the right partner to go to market with for this kind of thing. 

Josh Flanagan: 
So over to you now, David. For anyone who does not know Simbian, can you give us a quick introduction to the partnership from your perspective? 

David Greene: 
Absolutely, Josh. Simbian is a company focused on using AI to deliver automation for enterprise security. Our belief is that in a security landscape increasingly dominated by AI-oriented threats and AI-orchestrated attacks, the only way companies are going to keep themselves secure is by using AI tools in response, tools that operate at speed, deliver a coordinated response, and build on the deep information and context that exists within a company. 

As Chris mentioned, we already had a heritage in building tools like this for security practices. Our initial focus has been on things like AI SOC and alert processing, and now, through the partnership with LRQA, we’ve had the opportunity to bring that expertise over to penetration testing. 

The value of the partnership is that Simbian can offer expertise in applying AI to security automation and security operations, while LRQA has incredible expertise in how to properly run a pen test. Bringing those two things together gives us one product offering and one solution, which is what we’re talking about here today. 

Josh Flanagan: 
I’m going to come back to you now, Chris. What’s changing in the threat and technology landscape that’s really putting pressure on security testing programmes at the moment? 

Chris Oakley: 
Good question. Testing programmes do struggle to keep up, and the honest answer is cadence and coverage. A lot of programmes are built around point-in-time assessments. You go through the cycle of testing, you remediate, and you test again in 12 months’ time. But the environment you tested in January looks a little different, or maybe completely different, by March or April. 

What we consistently see is that organisations are confident in what was tested, but they have limited visibility into what has changed since then. Retesting is another pain point. It is time-consuming and often deprioritised when consultants are in demand. Coverage can be the silent gap too. There is always more scope than budget, so choices get made about what is tested and what is not, and that is where risk starts to accumulate quite quickly. 

AI Powered Penetration Testing directly addresses those gaps. It gives you more frequent testing cycles, faster retesting after remediation, and broader coverage without proportionally increasing the cost. 

Josh Flanagan: 
Thanks, Chris. Those are some really good points. If we dive a little deeper into that, where do testing programmes typically struggle to keep up, even when teams are doing the right things? 

Chris Oakley: 
We’ve got three things converging right now. The attack surface is expanding at quite a rate. You’ve got cloud APIs, third-party integrations, and even shadow IT. The traditional perimeter does not exist in the way it used to, not in a meaningful way. 

Alongside that, we’ve got threat actors themselves adopting AI to move much more quickly. They move faster, iterate more quickly, and often find weaknesses before vendors do. On top of that, the regulatory environment is tightening quickly as well. Whether that is disclosure requirements, testing requirements or contractual obligations, a lot of organisations are being asked to demonstrate assurance more frequently and with more rigour. 

So you’ve got more to protect, faster-moving threats, and higher accountability. That combination is putting quite a lot of strain on security testing programmes that were designed for a different era. 

Josh Flanagan: 
Thanks, Chris. I’m going to come to you next, Dave. It’s pretty clear from what Chris was saying there that things have changed a lot lately. From your perspective, when people hear AI Powered Penetration Testing, what do they assume it is, and what do we need to correct early on? 

Dave Parsons: 
Thanks, Josh. When people hear AI Powered Penetration Testing, they typically assume one of two things. It is either an automated scanner with a new label, or it is a system claiming to do everything and replace the human consultant entirely. I think both of those assumptions are not quite right. 

What we’re actually trying to build, and what it represents, is effective AI capability shaped by LRQA’s operational experience. We’ve worked very closely with Simbian on the design of the tool, bringing decades of consultant penetration testing into how it thinks and behaves. 

That is not a small thing. The fundamental difference from a traditional scanning tool is that this tool verifies through exploitation rather than signature-based detection. It is not just flagging potential issues, it is confirming that those issues are real, and that decision came directly from how our consultants work. 

It is the difference between a tool built by people who understand what good penetration testing looks like and one that does not. We’re not trying to replicate a scanner with this tool. We’re trying to replicate what a human might do. 

Where our consultants add value is in contextualisation, business logic vulnerabilities and remediation. Vulnerabilities do not exist in isolation. Their severity depends on the environment they are found in, the data they touch and the security controls around them. 

Clients are still going to need that layer of expert judgement in terms of risk and context, and we’re very happy to provide that. Our consultants can review findings, prioritise them by risk, and translate technical outputs into remediation guidance that organisations can actually act on. 

Effectively, it is a tool that enhances what our consultants can deliver and extends the reach of our testing programmes, but it is never designed to replace consultant-led penetration testing. 

Josh Flanagan: 
With that context in mind, Dave, it would be useful to explain what the service looks like in practice. How does AI Powered Penetration Testing work, and what does a client get back if they engage with us on this? 

Dave Parsons: 
From a client experience perspective, we’re trying to make it as simple as possible. We have a customer portal that access will be delivered through, where clients can already manage their broader assurance activities, and the AI penetration testing capability provided through the partnership with Simbian is integrated straight into that environment. 

There is no separate platform to manage or onboard to. Once in that platform, clients can run tests autonomously. They are not waiting for a consultant to be scheduled. They can initiate testing when it is relevant to them, whether that is post-release, post-remediation, or as part of a regular cadence of testing. 

The output is where this tool really differentiates itself from traditional scanning tools. Clients are going to receive penetration testing reports that are as close to human consultant-produced reports as we can make them, with structured findings, risk context and screenshots documenting exactly what the tool observed. 

It is not like a traditional scanning tool where you get a raw vulnerability dump. It is something that actually supports decision-making. 

Two of the features we really like and would highlight specifically are, first, the retest button. Once you have found vulnerabilities and remediated them, a client can instantly validate that remediation instead of waiting for a consultant to be engaged. That closes the loop much faster than traditional retest cycles. 

Second, the platform includes AI thought traces. It shows the thought process behind each finding, how the AI reasoned its way to a conclusion, what it tested and what path it took. That transparency is really important because it gives our consultants context to validate findings where needed, and it also gives clients confidence because they can see what the tool did and how it reached its conclusions. 

Josh Flanagan: 
Thanks for that description, Dave. It’s a really useful insight into how it works. I’m going to come to you next, David, because it would be interesting to hear from the Simbian side what is happening behind the scenes when a test runs. 

David Greene: 
Great question, Josh. What we really tried to do, building on Dave’s comments, was make the AI agent behave the way a human pen tester would behave, following a very similar workflow and methodology. 

For anyone thinking about the solution, you should think of the AI agent as another employee on the team. There is an upfront step where a human specifies the target, the scope and the level of testing that needs to be done, and then the agent begins its work. 

The first step is autonomous mapping, where the agent scans the application using the URLs and credentials that are provided, identifies application components, starts to log into the application, and explores what is happening. 

It then moves into an adaptive discovery process. That process checks for flaws in application logic, workflow between screens, and how processes move from one part of the application to another. That adaptive discovery is informed by context, by the information the agent has about the environment and the applications. As customers do more testing, that context becomes more useful in guiding the pen test. 

From there, step three is exploitation and validation. The agent actively tries to exploit the vulnerabilities it has identified. That is fully documented, and transparency is critical. We capture things like screenshots as evidence of what was actually found. 

All of that is then provided in a report. In step four, we present detailed remediation guidance. The goal is to give developers enough information to reproduce the issue themselves. The agent documents its exact steps, so a developer can follow those same steps and see the vulnerability for themselves. 

The agent also assesses potential risk based on the specific vulnerability and broader data sources in the market, so it can determine whether something is high, medium or low priority. 

Once remediation steps have been defined, there is also the option of integrating this into a case management system, so the agent can work with whatever tool you use to track follow-through. All of this is then captured in a set of metrics that show how the overall programme is doing. 

So overall, we are trying to parallel the process a human would follow when conducting a pen test, but make those steps happen autonomously and automatically. Typically, all of this can be done in a matter of hours rather than the days or weeks you would expect for a human-led pen test. 

Josh Flanagan: 
Building on that as well, David, what are the key controls you would want a security leader to understand so they are confident the tool is being run responsibly? 

David Greene: 
Another really critical topic, Josh. This is an area where we’ve had some great guidance from the LRQA team, who are used to working in secure environments. 

When I tell people I’m going to have my agent try to hack their application, they understandably get a little nervous. So there are some important guardrails we’ve put around what the agent does. 

The first is how we use large language models. All LLM work done by the agent uses private instances of the models. At no time is any customer data used or shared broadly in public models, and at no time is it used to train public LLMs. This is a constrained, narrow environment used only for this specific purpose. 

The other thing we have tried to do is build trust by keeping the agent’s work transparent. You can see what the agent is actually doing. You can decide for yourself whether it took the right course of action, and you can provide feedback to the agent. 

If at any point the agent is disrupting the operation of an application, you have the ability to stop it immediately. 

We have also built in a full set of enterprise security capabilities. That includes integration with SSO systems, enforcement of access controls that may exist within the environment, and full data segregation where multiple organisations or business units are running tests. All data is encrypted in the system, both at rest and in transit. 

We also have the ability to deploy the solution in different geographies and regions if there are specific in-region compliance requirements, for example if all data needs to remain in the EU or the Middle East. 

Those are some of the things we’ve done to make sure you can trust the tool, have confidence in its output, and not worry about where your data is going. We’re also happy to back all of this up through contractual arrangements, whether that is data processing agreements, security agreements or confidentiality terms. 

Josh Flanagan: 
I’m going to come back to you now, Dave. Once people understand the mechanics of what AI Powered Penetration Testing can do and how it works, the next question we tend to hear from clients is how they can trust and act on the outputs. How do you keep outputs credible and useful, and what role do LRQA experts play in reviewing and validating the findings? 

Dave Parsons: 
From the start, when we began the conversation with Simbian, we were really clear that trust in the output was the most important thing. It was almost non-negotiable, and that shaped the way we built the tool. 

That meant moving away from the traditional scanning tools we often see. Most scanners work on signature-based detection. They look for patterns associated with known vulnerabilities. Simbian’s platform is built to verify through exploitation. It does not just flag a potential issue, it attempts to confirm that it is actually real. That alone significantly reduces the false positive problem that makes traditional scan outputs painful to work through. 

On top of that, we have done a lot of work in building validation mechanisms and scoring systems to filter out noise before findings ever reach a client. The output clients receive has already been through a quality assurance process. It is not just a raw list that needs extensive triage. 

This is where our LRQA consultants come in. That support can be provided on request. Clients can escalate findings for review, for additional context, for risk prioritisation, for remediation support, or simply for help understanding the output from Simbian’s pen testing tool. 

That means we do not have to review every single finding as a mandatory step, which keeps the service agile and cost-effective, while still ensuring expert oversight where it is genuinely needed. 

The AI trace also supports that process. When a consultant does a review, they can see the proof of concept from Simbian’s tool and the logic behind how it got there. So the credibility of the output is not solely dependent on human review. It is built into the tool from the ground up, and we have worked extensively on that. We have a lot of confidence in the output and in clients being able to trust it. 

Josh Flanagan: 
Great. I’m going to bring Chris back into this now as well. We’ve touched on trust and governance quite a bit in the previous questions, and I think we’ve done that because, for most buyers, that is where the decision gets made. Why should a buyer trust LRQA to deliver AI Powered Penetration Testing as part of their programme? 

Chris Oakley: 
You’re quite right, Josh. Trust is really important. In this context, trust comes from track record and governance as much as it comes from technology. 

At LRQA, we’ve been delivering assurance services, including offensive security services such as pen testing and red teaming, for decades. We understand what responsible security testing looks like. We understand accountability structures and what clients require when something is running against their environments. 

These are really important factors. What we did not do was bolt AI onto a service, call it done and ship it. What we’ve done is design this new capability with the same disciplines we apply in traditional consultant-led work. 

We still have defined scope, controlled execution, clear evidence trails and expert oversight throughout every stage. 

Buyers are not just purchasing access to a tool. They are engaging with our assurance framework as a whole. That means there is a named accountable partner. We have been doing this for a long time. We are accountable. We are not just a faceless platform. 

That distinction matters a lot when testing touches production systems from a trust and safety perspective. 

Josh Flanagan: 
Thanks, Chris. I’m going to stay with you on this one as well. Are there any non-negotiables that spring to mind, things we considered when building this with Simbian that we simply would not compromise on? 

Chris Oakley: 
Yes, definitely. There are some things we simply could not go to market without. 

Top of the list is safety. We have a do no harm mantra. That means you should not run our tool and find impacts to the availability or integrity of the system under test. That is really important to us. 

As part of that, scope control comes under the same safety umbrella. The test is going to run exactly what we have agreed it should run. It runs against what you request and nothing beyond that boundary. It stays within scope at all times. 

There are a couple of other non-negotiables as well, namely confidentiality. Client data stays within agreed retention periods and is handled with exactly the same rigour we apply to any sensitive engagement. 

Evidence and accountability were also really important. Every finding needs a clear audit trail. Where did it come from? What were the steps to reach the conclusion? Why does it exist? Clients want to know what ran, when it ran and why. 

Then, slightly less about governance and more about product efficacy, consistency was also really important. Once you have nailed the safety elements, the question becomes whether the consistency is there. If you run the same test multiple times, do you get wildly different results, or do you get a comparable result set? 

That can be difficult with AI technologies because they are non-deterministic by nature, but we have put a lot of work into ensuring consistency, and in fact into making it exceed the consistency you might get between two different human pen testers. 

Josh Flanagan: 
So just coming to you on that as well, David, what happens to client data during a test? Where is it held, how long do you keep it, and is any of it used to improve the models? 

David Greene: 
The starting point is what Chris was just saying. We honour the retention times and data handling guidelines that LRQA already has in place. That is the baseline for the agent. 

We then add additional layers on top of that because AI is involved. All data processing takes place within the geographical region specified by the customer so we can meet specific compliance requirements. That includes the use of the LLMs themselves. 

The data passed to the LLMs is processed only within private instances of those models. It is never used as training data. The last thing you want is an LLM being trained on your own vulnerabilities, so that never happens. 

The results presented back within the product interface are governed by the same access controls that exist for other LRQA products. As the customer, you decide who can see that information, when they can see it, and how long it remains available. 

All of this is built on what Simbian calls our trusted LLM architecture, where we maintain an insulating layer between customer data and the LLMs themselves. Simbian software sits between the two, evaluates what needs to be sent, breaks it up to keep it secure, and minimises the amount of data that needs to be passed over. 

That all works together to make sure we get the value of AI-driven insight without exposing customer data. Again, we are happy to support this with contractual assurances if customers need that for their own compliance requirements. 

Josh Flanagan: 
So coming to you now, Dave, building on what Chris and David have shared, what does safe operation look like in practice, especially when testing against production environments? Are there any situations where you would advise against running it in production? 

Dave Parsons: 
Yes, definitely. This is another area where we’ve applied our penetration testing experience directly. 

Running automated tooling against production environments without proper controls is where incidents come from. We’ve seen the consequences of that across the industry for years, and that informed how we approached the safety architecture with Simbian. 

Effectively, the platform operates in two modes. It has a safe mode, designed specifically for production environments, which constrains what the tool will do. It limits the aggressiveness of the testing and the actions it can perform. It also gives you a printout of the operations it did not perform because of safe mode, so you can clearly see the trade-offs. 

For most clients, particularly those in regulated sectors or with lower risk appetite, that is the right choice. We also have a standard mode for isolated or non-production environments. In that mode, those constraints can be removed and the tool can behave more fully exploitatively. That means things like executing injection attacks, extracting database contents, cracking hashes, and demonstrating a complete attack chain end to end, much like a human pen test would. 

That deeper level of validation gives a much clearer picture of what a real attacker could actually do. We are here to support decisions around which mode is appropriate for which environment, taking into account the sensitivity of the data in scope, the potential blast radius if anything goes wrong, and the overall objective of the test. 

In the vast majority of cases, safe mode is exactly that: a safe reduction. But like any testing activity, there are always environments where it is worth having a conversation first. If it is critical, high-availability infrastructure, or it handles particularly sensitive personally identifiable information, then it is simply good practice to understand exactly what is in scope, what the guardrails are, and what expectations need to be set. 

That is no different from how we would approach any human-led engagement. Common sense still applies, and that is how we have approached the whole implementation. 

Josh Flanagan: 
Perfect, thanks Dave. So over to you now, Chris, to wrap things up for us. From the early client conversations and feedback we’ve had so far on the tool, what is resonating most strongly, and what does that tell us about where security testing is heading in the future? 

Chris Oakley: 
Good question. It is clear there is a lot of interest across industries in this technology. The reality is that testing is changing, and it is going to continue to change. 

I do not think human pen testers are going anywhere any time soon. What is likely to change rapidly is how they deliver the work they do. 

We are seeing a lot of clients respond strongly to the idea of continuous assurance. That resonates with them. They understand that security testing is more than just an annual event with a big gap in between. Early conversations are consistently coming back to a similar theme. Clients know their environments are changing. They know point-in-time testing has a limited shelf life, and they have limited budgets. 

What they are looking for is something that closes that window more effectively within a realistic budget. There is also a real appetite for faster validation cycles after remediation. Clients want to know, if they think they have fixed something, whether they can confirm that quickly. 

It is a simple ask, but traditional programmes often struggle to satisfy it. We still come across situations where there is a specified retesting day in traditional programmes, and that just does not align with how modern software development and systems change actually work. 

To me, that tells us that buyers are increasingly thinking about assurance as an ongoing capability rather than a single or periodic project. AI Powered Penetration Testing fits exactly into that model. 

I am pretty confident that this shift is only going to accelerate over the next one to two years. A couple of years down the line, traditional penetration testing as we know it is likely to look very different, and the way organisations approach penetration testing will be significantly different from what we see today. 

Josh Flanagan: 
Thanks, Chris. That is a great way to wrap things up for us, and that brings us to the end of today’s episode. 

Thank you to all of our guests for the discussion, and thank you to everybody for listening. Today we introduced AI Powered Penetration Testing, a new solution from LRQA designed to help organisations keep their security testing aligned to the pace of change while maintaining the governance and credibility you would expect from LRQA. 

If you would like to learn more, do visit LRQA.com and please do get in touch as well. 

Thank you for listening, and we’ll see you on the next episode.