Will AI eliminate 90% of QA jobs within the next two years?

In this episode of Your AI Injection, host Deep Dhillon sits down with Kevin Surace, CEO of Appvance.ai and 30-year AI veteran. Kevin doesn't sugarcoat it. His platform replaces manual testers and scripters as enterprise CFOs demand measurable headcount reductions as ROI. Kevin explains how his platform autonomously generates millions of regression tests, and the two dive deep into the technical mechanics of digital twins, AI script generation, and the brutal economics driving enterprise adoption.

What happens to society when knowledge work gets automated at scale? Kevin and Deep explore whether we're heading toward mass unemployment or a new economic paradigm, touching on everything from insurance claims adjusters to the future of human purpose in an AI-dominated world.

Learn more about Kevin here: https://www.linkedin.com/in/ksurace/

and Appvance.ai here: https://appvance.ai/

Check out some of our related content:

Get Your AI Injection on the Go:

xyonix partners

At Xyonix, we empower consultancies to deliver powerful AI solutions without the heavy lifting of building an in-house team, infusing your proposals with high-impact, transformative ideas. Learn more about our Partner Program, the ultimate way to ignite new client excitement and drive lasting growth.

[Automated Transcript]

Kevin: AI eliminates those jobs, period. Full stop. And we, we already can do that today. So this isn't the future. Now, you know, should we, was your question? I will tell you, there's not a ccio slash CFO and sometimes they're locked at the hip a little bit.

That is going to pay for AI going forward unless they find the savings somewhere else. Money doesn't grow on trees in an enterprise. it could be that it's customer facing and it improved our revenue, right? Improved our advertising, improved. But for backend systems, nobody's gonna pay for Devon if you cannot measure the improvement in coding in some way, or reduction in hiring new hires that you don't need anymore.

CHECK OUT SOME OF OUR POPULAR PODCAST EPISODES:

Featured

Will AI Eliminate 90% of QA Jobs? The Future of Testing Automation with Kevin Surace of Appvance.ai

In this episode of Your AI Injection, host Deep Dhillon sits down with Kevin Surace, CEO of Appvance.ai and 30-year AI veteran. Kevin explains how his platform autonomously generates millions of regression tests. Explore whether we're heading toward mass unemployment or a new economic paradigm, touching on everything from insurance claims adjusters to the future of human purpose in an AI-dominated world. [LISTEN] - [SUBSCRIBE: YouTube, Spotify, Apple Podcasts]

Is This the End of Traditional Coding? How AI Orchestration Might Render Developers Obsolete with Laly Bar-Ilan of Bit

In this episode of Your AI Injection, host Deep Dhillon chats with Laly Bar-Ilan, Chief Scientist at Bit, about a near-future where developers become code orchestrators. Laly explains how Bit’s composable software and AI-driven componentization might eliminate manual coding, letting developers simply curate AI outputs. Tune in to learn if a graph of reusable components can tame sprawling code bases and redefine the role of developers. [LISTEN] - [SUBSCRIBE: YouTube, Spotify, Apple Podcasts]

Your Code Base Is Already Gen AI—Now What? with Matt Van Itallie of Sema

In this episode of Your AI Injection, Deep Dhillon chats with Matt Van Itallie, founder and CEO of Sema, about how tools like the GenAI Code Monitor are reshaping the role of developers. They discuss how AI is shifting developers from creators to curators, tackle the risks of excluding humans from the code loop, and explore what transparency and responsibility mean in an AI-powered world. [LISTEN] - [SUBSCRIBE: YouTube, Spotify, Apple Podcasts]

Deep: thanks everybody for joining today we have Kevin Kevin's a 30 year AI veteran. He's, inventor of the virtual assistant portico back in the nineties and now leads, Appvance.ai App fans is Genie Digital twin platform autonomously designs and runs millions of regression tests with the goal of eliminating QA costs.

He believes foundational models are quickly becoming commodities and the real mode is specialized stacks of tuned transformers aimed at concrete pain points like software qa. Kevin, thanks so much for coming on the show.

Kevin: Yeah, happy to be here. Thanks for having me.

Deep: Awesome. Why don't we get started. Tell us, a little bit about what people did before your solution and what's different with your solution.

Xyonix customers:

testimonials

“I don’t think a five out of five really encapsulates the work that they do. The work is top-notch. It’s what we ask for and more. They go the extra mile in terms of letting us know that whatever we need, they’re there for us to lend their expertise, to be in a meeting if they need to, to explain the project in more detail. So it’s really going above and beyond.”

Dominique Grinnell, Sr. Product Team Manager at Delta Dental of Washington [more]

“Xyonix’s ability to take a problem that has very little structure around it, build an ontology, and provide an end-to-end solution was pretty impressive. You have to bring quite a few different skill sets to the table, and to have one group of people be able to do that in a really clean and consistent way was very nice and well-executed.”

Eric Kolve, Director of Data Science at Dexcare [more]

“We talked to 15-20 different companies. We chose to go with Xyonix because they understood our problem better than other firms. Deep is an expert in his occupation. He knows what he's doing and clearly understands this world. ”

Matt Leibel, CTO @ Xometry [more]

Kevin:

specifically at App Advance, we have focused on software qa and everyone knows what software bugs are, right? And, your typical enterprise has thousands of applications that they manage, update, upgrade, fixed stacks on, et cetera. And all of those have to be tested, over and over and over again with regression testing, right?

What people have done since the sixties is the majority of it's manual testing. It's just literally teams of people around the world that do that work. it is, of course it's manual, so it's inconsistent and it has all the things that happen in manual. It's costly, it takes time, et cetera.

But it's how we've done it. And then we've had test automation basically since the nineties for about, 30% of, of, of, of those regression tests. And, that is a very painful process in and of itself because test automation, either writing code. So you've written code for the application. Now you have to write code to test the application, or you're recording code with a recorder, that have been out since the mid nineties and they've gotten substantially better.

but these are expensive people, more expensive than the manual testers. So whether you're automating or doing manual testing in regression, you quickly end up with hundreds of thousands, if not millions of regression tests that you have to maintain for the rest of your corporate life. Right. People will come into that corporation and die, and only that number gets bigger.

So we looked at it and said, we think there's a better way.

Deep: Yeah. And are you focused more on application level testing, like coming in through the front end, or are you Yeah. And full end to end, are you also doing like white box testing where you're like consuming the code and helping generate. You know, unit tests and regressions.

Yeah,

Kevin: no, not unit testing and not, source code testing, uh, which is a whole different field actually. Mm-hmm. It's actually full end to end. And the reason is, you know, people think of an application if they're not in our field. They think of an application as, I don't know, something that runs on the phone or something that is, you know, this one thing.

But a typical corporate application is talking to more than 20 backend systems. So it actually can't practically be tested as an end to end test until it's fully connected to those systems. Because APIs change and backend systems change and algorithms change in the backend and, and, and end. And so there is this sort of, uh, end-to-end regression testing that must happen.

You really don't know if this change in a backend system where you updated the API and changed the stack. You don't know if that's gonna affect these. 50 other systems that use that system, right? Until you go back and test them and you say, well, we have a set of criteria and a set of business rules and a set of business goals in this application.

Do those all still work? Do they come out with the right outcomes? Do they put up the right pop-up box? Do they stop you if you did the wrong thing? Do they come out with the right numbers? Whatever it is, right? and so that's end-to-end regression testing. It is, it probably occupies 80 or 90% of all testing in the, in the industry.

It's a very big number. It is a given that if you're developing a new application, you are certainly doing unit testing. You are certainly doing source code testing. you're doing a lot of things. But again, in a regular enterprise, the number of new applications that are being developed is less than 1% of the total application inventory.

So it's small in comparison.

Deep: So you mentioned enterprise a few times. Tell me a little bit about that focus. Does that mean that your, target customers are building internal tools for within their enterprise? Or are you sort anyone building a consumer app as well?

Kevin: It can be consumer. so for example, we have customers who are in the e-commerce space and they will test their e-commerce application and they will also test 2000 other applications that run their corporation, right?

So if you look at, for instance, a bank, let's take JP Morgan 'cause they're the largest, they have about 15,000 applications that they maintain and manage today. there's about 10 that face the outside world that consumers might see or businesses might see all the rest run the company backend, front end middleware, everything right, that they own and maintain.

Some of those, obviously like an ERP system is. SAP and that was built on the outside, but of course they've customized it on the inside. So now it's their responsibility for end-to-end testing. But it's 15,000 applications under management and under regression testing on a regular basis.

Deep: Got it. So maybe let's dig into the JP Morgan scenario a little bit.

So one of the challenges with, you know, manual or automated testing is that it's easy to kind of get light superficial testing, but building up sort of a realistic, actual user profile that's gone in and done a bunch of stuff and now you're going off and doing, using the application and there's various expectations based on what state you're in.

Yeah. Walk us through like, in your application. So for example, with the JP Morgan case, JP Morgan, you have to have a real bank account. Like, you know, they have to have like provisions for you to have a fake, bot sitting in there actually with an account that can. Do stuff and a lot of their stuff actually does stuff like moves money around and

Kevin: That's

Deep: Right.

Right. So how do you with all of that?

Kevin: Yeah. Yeah. So, so, all major applications have, test users. And those test users, we didn't invent that, right? They've had to have test users. 'cause remember these are people who have been testing these applications some, some for 10 or 20 years. And so they've got test user accounts and they've got test data and they have a way to, either undo that or make sure that it doesn't actually commit a major transaction, right?

Or the transaction's a penny or something like that, right? If they want to test it all the way through. So there are already provisions for that. We don't have to fix that. They've already, they already have that. Remember if I look at any one of those 15,000 JP Morgan applications, there are already regression test suites for them.

There are already people doing those tests and they are already spending more than a billion a year. Doing those tests right across wherever their testers are and test automation is, et cetera, et cetera. So, we look at that and say, how can we apply AI to take those people out of those jobs? I'm sorry, that's, you know, it's just, that's what we do with AI, is we improve productivity, improve visibility, right?

So if you want to be funded by A CFO in any AI circumstance, by the way, I don't care if it's marketing tools, if it's uh, uh, dev tools, whatever, ultimately the CFO is gonna say, I'll give you a budget for a year, but if I don't see the savings, right, that budget doesn't grow on trees, I gotta see that savings in headcount reduction or some other thing in increased customer satisfaction or some other method, of measuring. That's what we see. large corporations, most enterprises look for today in ai. I'll give you the AI budget this year, but in a year I've gotta see more savings than that. What's the roi? Is it two x, is it three x? Is it four x? Is it five x? I got AI and customer support.

Okay, who's gonna pay for that million dollars a year? Did we take some customer support heads out? No. Well then forget it. I, I, it's budget canceled at the end of the year. So we make no bones about the fact that ultimately this drives headcount reduction. That pays two x three, x four x, five x as much as 10 x more than the software in terms of savings.

Deep: we'll come back to the potential, ethical ramifications of that 'cause I do want to explore that area more about what actually happens when AI starts taking all these jobs. But, for now, let's put a pin in that. Uh, 'cause I wanna understand your, um. approach a little bit. so let's say that you start at the, JP Morgan's scenario, you fire up your app for the first time.

What does the configuration process look like? Sure. Like walk us through, so you've got a new, testing, group, and maybe what are the psychosocial ramifications of using your tool? Are the testers hesitant? Um, are they on board? Like, what do they get out? Good question. Right out of the gate saying you're gonna lose your job.

So like, what else? Yeah.

Kevin: That, that is right. So, um, so AI and QA today, there's, there's two levels of it. Well, there's three levels. One is AI washing, where it's just some tool that says they've got AI and they have none. And I must hear this from eight out of 10 prospects. We tried this and we tried that.

We couldn't find any ai, you know, to save our life. Then there's another set of tools, by the way, that's the most common AI washing. Another set of tools that have patched on or thrown in some kind of copilot. Easy to do, right? As you know. And what that does is it engages the test people, right? The testers, because they go, wow, I can use this copilot thing, but it actually in most cases, reduces productivity.

And I could get into the exact reasons why, if you'd like to. And there's a lot, by the way, this is happening in marketing. It's happening in Dev. We've seen this. Sure. We've seen actual measurements where, so the people love it and they say, look, I'm using ai. Okay, but. Over the last four weeks you slowed down.

Yeah, but it's really cool 'cause I can investigate all these things. Okay. I I didn't need to do that. So gimme one example. So when we're testing web applications, the way the web works is you can identify an actionable element many, many ways visually, but also with a set of code that is underneath that element called, uh, locators or accessories.

And there's parent-child relationship and accessor ID and XPath and all kinds of ways to find those elements, right? That's just part of the web. Um, so a, a, a, a number of these automation vendors that added AI to the website figured out how to at least bolt on kind of chat GPT or something like it next to that, that says, Hey, let me give you a recommendation on a better locator than the one you chose.

Except the problem is, is back in 2016, we invented machine learning. That already not only chooses the best accessor without your input, it builds an entire accessor library and adds the visual. Like I don't need the human input there. AI can do that work, right? Actually, basic machine learning is doing that work.

I don't need the human's input at all. Now the human can override it, but I don't need a co-pilot And another co-pilot we've seen is, recorder, works in this way is that you use the application and we write the code, the test code, right?

You just use it. We're writing the test code that's mimicking your actions. You fill out a form, we mimic that form. You click on something, we mimic it. Yeah. It be

Deep: overly literal too, right?

Kevin: That's right. so then what, what some people did is they said, Hey, instead of recording you right next to it, we could put a co-pilot and you can.

Type in this box, instead of you clicking the blue box, it says you type in the box, click the blue box. Now the problem is it takes longer to type, click the blue box than it does to move your mouse and click the freaking blue box. So again, this is an example of a use of A GPT, kind of a thing like chat GBT or Gemini or whatever.

that actually slows the user down. But if you were a tester, you go look at the AI I am using isn't this great? I can now type what I want instead of just move the mouse. But nothing's faster than moving the mouse.

Deep: is your, so those are

Kevin: the two

Deep: Yeah. Into number three, what we're doing?

Kevin: Yeah.

Deep: That your model is going in there directly and ignoring the um, UX folks and figuring out all the potential permutations and then trying to assess which ones are important.

Yeah. Or is your point that we do leverage the humans actually interacting? And showing us, what the important, you know, use cases are and navigating them. But then we kind of interpolate outside of that a little bit.

Kevin: Right? So there's two things that we do. There's two AI areas of the product.

One is called AI script generation. We launched it back in 2017. What AI script generation does is it says, train me not on any of your, specific use cases yet, but just walk through the application. Okay. Literally all of the application, all the different states, et cetera, et cetera. And so there's a training session that can be an hour, it can be a day.

It depends on the complexity of the application, right? And the idea there.

Deep: Not, not too different than you would train a, an employee or something.

Kevin: not any different than I train a manual tester sitting next to me before I sent them to do their job. Except I only have to train at once. 'cause once it's gotten to every page, state.

anything that comes new, it will add to its knowledge base by itself, right? So I only have to do this one time. After you do that, we build essentially a digital twin of the application and we generate hundreds or thousands of new test cases based on your business validation. So the other thing you put in, in what's called a validation workbench are the business outcomes that you expect.

They could be data-driven, they can be b, d, D level, right? They can be. So maybe, so a different

Deep: example with the JP Morgan case.

Kevin: okay. A simple example. if the customer puts in the wrong password, pop up a box that says Wrong password. I'm giving you The most simple, uh, complex is, if the client happens to be in New York State and they happen to be in this city, then make sure they don't have access to why, right?

And

Deep: that's, and that's a rule that the human is giving you.

Kevin: The human puts those into a workbench and they come from the business rules that were laid out when the. Application was designed, right? So there's a set of business rules that are there. And so you take those business rules and you say, I'm gonna give the business rules to the system.

And there is basically in English, but sometimes there's some coding there be if it's complex logic, right? As you can imagine, because English doesn't describe complex logic sometimes very well. So you do put in sets of business rules that say, test these, but don't test them one time. Test them from every possible permutation of a user flow.

You can. And then what a, is

Deep: there back and forth? Like, is the bot working with the human to get the rules defined? Crisply?

Kevin: Not really. you just put the, you literally put the place, the rules into this validation workbench and they're placed in there. Some of 'em are in plain English. 'cause there's these dropdowns, the form in English and some of 'em more complicated.

You write it in JavaScript, or get some help writing in JavaScript. So now you've got what we call our validations. Once those are there, the system by itself will be begin to create brand new test cases. Use those validations and it will create thousands or even tens of thousands of every possible practical user, permutation, user flow that can leverage those validations and test those valid, including data driven ones.

Right.

Deep: Let's, let's dig in on the word practical. 'cause I think this has been one of the Achilles heels for automated testing for 20 years or so is Yeah, you can get automated testing, you can get automated code review stuff, and it generally generates a gigantic list of mostly inane garbage or utterly obvious things.

So practical can mean utterly obvious even whoever's building the thing finds out right away. but there's a big difference between that and a CTO jumping on your app and finding things that really matter. Sure, that's right. That are nuanced and quirky.

So what is the system doing to stack rank what it finds and Great question. Present them in a meaningful, impactful manner.

Kevin: There there are, um, three levels of issues that the system will find across all of the practical ways a user would use the application. So we don't include going backwards and breaking things, right?

It's like a user is using the application in the way it was designed, but there may be many, many, many pathways that lead to the same validation, So we may touch a logical validation a hundred times, 300 times. Why? 'cause users will. From different angles if you, if you can imagine it that way. so there's three tiers of what we do.

One, if you fail a validation, it's your business logic. If you fail the business logic, that is probably a P zero, P one that is a serious issue. 'cause it says a user can fail that. I'll give you an example. We have a client that has an application that depending on your role, it has, fields that are not supposed to be editable under any circumstances.

Uneditable, they're locked, right? So it's literally a locked record. You don't get to touch that. They test for this every time with one script. So literally that script runs and it always passes. Its fine. The first team they used AI script generation, it generated more than 200 tests of that same logic.

you know what I'm gonna say

Deep: presumably because. There was some other state that led up to these things being editable

Kevin: that that, that nobody had ever looked at before. But a user was able to, in dozens of cases, the AI was able to edit records that they did not have the authority to add, and they never found this.

And now they go back and they go, we don't know how many of those were edited over the last X years or when that bug even came in, because I'm telling my CEO with my one script that that can't happen. And yet AI comes along and says it happens 12% of the time actually. You just never saw it.

Deep: Yeah. And so in that case, I think, you know, you're presenting a scenario where the business validation rules are maybe so important that no matter how the bot figures out how to get in there and exercise it, it's worth immediately addressing.

But I think there's a lot of other scenarios. Where, let's say maybe the ramifications aren't so severe, but the path that the bot comes up with is, let's say, an utterly impractical or like a non-important path. 'cause it just doesn't happen. one of the things I've always seen with this automated, like I've never actually seen automated testing done in a way actually we're

Kevin: the only people who we're the only people who AI generates test cases like this and, and scripts period full stop.

A automatically no human involvement.

Deep: the reason I'm asking this is 'cause I think most people who've used the automated not AI kind of historical recorders. Yeah. Is that, sure they find a ton of bugs, but they're usually like not important.

So I'm really trying to understand like I have, how do you assess the importance of what the machine so deep?

Kevin: I haven't seen that because the test cases that humans write that then get recorded in end-to-end testing are all important. not finding something that you don't wanna find or you would've never taken the time to write the test script for it.

It's just not gonna happen. There's no automation at the end to end level except for us that writes its own test cases, period. So it can't find things except for us. Your point is clear and I will get to it. there's no other tool that does that. There's no other platform that does that. Now if you're talking about source code, there's lots of that.

We're gonna scan your entire source code and finds all kinds of things. That is an unrelated field, right? We're already at the end-to-end testing where you've got a system that's connected to 22 backend systems.

Deep: Yeah, I mean, I guess what I'm getting at is the, you have a limited resource of humans and human machines that can address the actual fixes.

I'm a CTO, I've been a CTO for 30 years. I've managed tons of teams. I've had incredibly large teams. A very, very common problem is that teams will wind up quickly with the test suite. That over identifies problems. Sure, I

Kevin: get

Deep: that.

And the teams forward, momentum virtually stops 'cause they run around trying to fix stuff. Got it. Ultimately doesn't really move.

Kevin: there's three tiers of problems we find. One is logical validations that fail. You put in the validations, you said they were important and it failed some percentage of the time.

You can either allow that ticket to get filed in Jira or you can say, I don't care about this anymore. Don't do that anymore. Second thing it finds is four xx, five XX errors. to some people those are the root cause of horrific problems later. And to other people, they don't care.

They again, get to ignore those. If they want, they just check a box and say, don't find those anymore. And then the last ones are, slow responses, meaning that even though we've just got a small number of users on this server, I'm getting responses that are taking over X seconds.

And you get to set that. Is it two seconds? Is it 20 seconds? And you can decide, I don't care if it's slow. You get to decide what's important to you practically. Here's what happens. The first time AI script generation runs, everyone in the team says. This can't be, you created thousands of tests and these can't be real errors, but we give you the script to repeat the error.

So then they run them and they go, oh, they are real errors. And then they have to decide which of these are serious. Like the one I gave you. The example I gave you is a serious error. You do not want people changing records that are not allowed to change records, right? So that's serious. Under any circumstance,

there's others that you go, yeah, the popup box doesn't come out, but I don't care about that. So then you eliminate that or you stick it in Jira and it's, P five or something like that. So the first time through, there's no question you're gonna organize these, results in some way in Jira or get rid of them or whatever.

And then after that, they become your regression suite and they're fully automated. you have, on average about 10 times the visibility or application coverage then you had before.

Deep: what do you think are the soft spots in the areas that you're trying to like, perfect time. 'cause

Kevin: Well, AI script generation, which I just described, creates brand new test cases.

Ignoring whatever your test cases are, by the way, they might overlap, But I just remember when we opened, I said, someone already has, who knows, a thousand test cases written for that application that are manually done today probably. Right? So the second part of the product, the other part of the product really, third is called Genie.

And what Genie does is it leverages the knowledge from AI script generation, basically the digital twin of the application. It leverages that we use right now five tuned transformers. I can take you through exactly how that works 'cause I think you'll find it interesting. And what we attempt to do, and we do pretty well, actually, is match what someone said in English in the test case to the actionable items on the page or on the page state.

So I know which state I'm in, I know what's actionable there and I know what they're requesting. And I can visually see the actual elements. I can visually see the entire of representation of the page. I can see the code. I have to visually see elements 'cause people describe things visually in these test cases.

Mm-hmm. You know, click the tree. Well, I better see that it's a tree. Right? and then what we do is rewrite that line of the test script automatically. So what Genie does is it automatically converts your existing test cases, no expansion, yeah.

Deep: Into the script

Kevin: two code. That's it. Yeah. All by itself.

No recorder, no human involvement. Go home for the night, come back in the morning and they're all done.

Deep: So on your AI show, we like to cover three things. One is what, you do. The other one is how you do it, and the last one is should you do it. so we've talked about the what, quite a bit.

so let's change directions a little bit and kind of focus a little bit more on the how. it sounds like, let's take the second application. So you, you know, you go back and forth with some LLM, maybe you're using some of the more, code centric models, and you're. Creating some kind of mapping from whatever code has already been written by the humans into your sort of, action scripts.

Are you also coming out with business rules as well? based on those scripts? 'cause that's probably in there too.

Kevin: Are we talking about AI script generation, which we talked about first, or Genie, which we just talked about?

Deep: Well, why don't we start with the latter and then go to the former.

Kevin: Okay. So Genie is doing one thing. You have a thousand manual test cases and I'm going to now automate those and get rid of the manual testers. They will run as your regression suite every time you ask it to.

the first thing we have to do is interpret the test cases in a variety of industry formats. So we use an LLM with RAG to do that. So we have set up the format that we want, which is. A very specific JSON format, JSON input format that has a step number.

It has only one action per step. It has a set of data that may be tied to that action, there could be multiple data inputs before the action is taken. So the action might be a submit button, for example, in a form field. and then it has any expected outputs, right? The equivalent of an assert.

and those could be visual asserts or textual asserts or data asserts or whatever. They could be data driven. So we take generally whatever formats someone's written these in and converted to our format. They never see that, but that happens.

Deep: I understand that format a little bit because I wanna understand what you're actually giving your model.

So. presumably there's like an entry, like if this is a web app, there's an entry URL to a page that you're looking at. Let's take that

Kevin: back. Yeah, that'd be step one. Yep. It's actually go to that URL. Yes.

Deep: Okay. So then the action on that case is,

Kevin: it's called Navigate two. Actually. That is action number, step number one.

Deep: So you'd have something like step one, navigate two with a URL. Step two might be like action, like fill out the form, and well log in, right? Or, or Yeah. Try attempt to log in, right? With some data

Kevin: would, yeah. Right.

Deep: Uhhuh, right. And then step three would be you,

Kevin: go to this place on the menu.

Step four, fill out this other form. Step five, select the state that you're in. Insane. Step six, you know, whatever. Right.

Deep: The point is that it's textual. It gives you, instructions that would make sense to a human. and an LLM and in the LLM in your world has all of the. Mechanical infrastructure to execute these things.

Meaning you've got, the ability to go to A URL. You've got the ability to populate a form, you've got the ability to, assess

Kevin: it's, let me answer that 'cause it's not that simple. so the LLM I'm talking about right now, and these are all privatized models, right? These are all open source models that we privatized and either trained in a specific way or set them with RAG to work in a certain way for us, and or giving them a whole set of, of guideposts, if you will.

So, so lemme say the first LLM is doing nothing but leveraging RAG to transfer the test case that came in, maybe in Excel, maybe in A CSV, maybe in a Word document, maybe in whatever, transfer it to R-J-S-O-N. They won't see that, but we need it transferred to something that we understand in a very specific format, right?

That's all that happened there. We didn't change their language, we didn't do anything. We just moved it from this to that. Because people sometimes will put like five steps in one step. Step two, do these five things. Nah, we can't do that. Right? Because we need a step to go to a line of code ultimately. So all that breaks down into five steps instead of one, for example.

Yeah. So all of that is done in that first LLM. now I have A-J-S-O-N format. That is my format. It's a step number. It is an action, it is maybe some data input and it's finally, what do you want to validate happened at the end of this step? 'cause most of these say what's the expected output? The expected output is, I see the following thing on the page.

Right? Whatever it's, so we have to look for that expected output. The other thing I've input here is that digital twin, and we don't, we probably don't have enough time to talk a lot about a digital twin, but just that is an offline, if you will. a static image of every state of the application and where those states connect to based on actions in each state.

I know that's a lot of words. And they are big. They can be 50 gigabytes. It has every actionable element visual, it has all the code you know, sort of the locators that go underneath that and it has a page visual for every page state. Okay. So all of that's stored. I understand

Deep: that.

What's the point of this? Like why can't the system just interact directly with the system?

Kevin: Ah, that's a good question because we need to go substantially faster than humans and to go substantially faster than humans, I'd have to go faster than an application. And an application can only respond, call it three seconds of state.

Two seconds state, five seconds, a state, whatever it is, right? But what happens if I need to execute in parallel without actually changing the application state, but I know where I would've gone, and I wanna understand how I go from step one all the way to step 46 and understand that whole string, and I need to do it in a hundred milliseconds.

I can't do it with a live application. So we patented this concept some years ago of a digital twin of an application, which I can now navigate instantaneously. And I can go back, I can go backwards and look at things. I can do these in parallel,

Deep: that doesn't make sense to me.

I mean, if you want it to be faster, then that seems like an opportunity for, a performance tuning application. Could the app builder, if there's a bottleneck in a particular part of their app, they should address it. If there's a problem in running your users in parallel. At a throughput level that they, you know, need to support anyway, then they should address it.

Kevin: no, because their application is, is what it is, right? Someone, operating recorder Can create a script no faster than one X application speed, if they could click infinitely fast, which they can't, so practically it's two x application speed, right?

If I wanna replace the recorder, ultimately I have to go faster than one X application speed, and I can't start over all the time. I may need to evaluate a set of steps back and forth, right, for a whole set of reasons. and I may need to interrogate something I did before to understand what I'm doing next.

In writing the scripts, not in testing anything here, just writing scripts. So if I have an offline version that responds instantaneously and doesn't affect the states, if I go backwards, well then I have the perfect model. to generate scripts from. It's no different than doing a digital twin of something else.

There's value in having, say, a digital twin of the earth, because I can simulate things that I couldn't do in the actual Earth. That's an obvious one. But this is the reason we do that. I mean,

Deep: I think you're saying, if I understand you right, you're saying that this type of testing is can be a burden on the live system and so you're trying to offload

Kevin: No, it has nothing to do with the burden on the live system. It has to do with the fact that I am trying to do things that would be very hard and long and arduous on the live system, and I only have to do it once. I've gotta generate these regression scripts once and once they're done, they're gonna use 'em on that application for the next, who knows, year, two years, five years.

It's

Deep: like you're building an an offline caching system basically. Yeah,

Kevin: there you go. That allows me to interact with it without breaking it. If you're gonna

Deep: keep doing the same thing over and over again, and you don't need to necessarily use the live. Okay, that makes sense.

Kevin: they will eventually use the scripts that are generated against their live application.

But I'm generating them using a digital twin, which is the only practical way to do it. When you get into the idiosyncrasies of what has to happen inside that,

Deep: presumably you like some of those path flows, you know, expire just like a cache because somebody, some team inside deployed some new feature or something.

Kevin: And then you can retrain it, learn the new behavior and generate new scripts. But otherwise you've got your regression suite that yesterday was manual and today is now automated and it's an exact replication of the manual tests. Right. And the reason that was able to happen at a speed of a hundred scripts an hour or faster, depending on how many GPUs you deploy, is because I'm not doing it with the actual application.

I can do it with a digital twin and interrogate the heck outta that digital twin and never impact. Its, its fixed states, right? Yeah. I mean

Deep: it ma I guess it makes sense 'cause you, because you're doing a ton of repeat operations in order thing to figure out how to construct these scripts.

Kevin: That's right, That's exactly right. so now I've got my JSON and I've got my digital twin. And that digital twin has all this information. So now what I need to do, now I have to caption my actionable elements. I need to convert the actionable elements to some kind of English that someone might have said, right?

So I need to understand how that might map. So we look at lots of captions, a blue button, it says submit. also I wanna know where it is on the page. So now I have to look at the entire page. I have to build a page model and say that blue button happens to be in the lower right corner, right? Say, I want to get all of that information back to text, And now finally, I have some interesting information, I've got captioned elements, I have got essentially a captioned page. I have got, everything in JSON. And now I can present this all to an LLM in the following way. I can say, I have a line of text that says click the blue button. What should I do on this page out of all of the actions?

Now, the truth is we don't give it all the actions 'cause we wouldn't get the best results and it'd be slow for the LLM even with a wide aperture. so of course we've already, that's why we

Deep: kind of understand, right? Because like, yeah, we've already

Kevin: picked the top three or four

Deep: bunch of folks have this concept of steering the browser and, you know, doing it via text.

Kevin: yeah. so far it's very hard to do that and get better than 70% consistency. But in QA I have to be a hundred percent. So follow my logic. we've done all of this. We are now scoring all of the actions on the page and saying, what is the highest scoring one? Okay. How well does that map to what they said versus everything else I could do on the page?

And then what do I do? I feed that and it's a it's a state and a, an element id, an internal element id. So I have, you know, state four element ID three, And I have my digital twin. So what happens is I feed that to a code generator that we built.

Mm-hmm. And the code generator says I go to the digital twin and what do I see? I know exactly that state, that actionable item, right? That element. And under that element I see all the code and I build out the code based on all of the locators. And it's the code to

Deep: navigate that page. The code to that.

Yeah.

Kevin: The, the code to execute, execute that element on that page through a code generator, which looks a little bit like a compiler in a way. 'cause I've given it very little information. But between that and the digital twin, I have all I need. I reconstruct perfect code, not a guess at code, and that is ready to run.

And we do that step by step by step, but we do it fast. It's, you know, 10,000 steps in an hour or so on. One A 100.

Deep: That executable code to execute your actions on a given page, is that exposed to the, to the users if they need? Mm-hmm.

Kevin: Yep. That drops out as scripts and you can see them in what we call designer script English, or you can see them in JavaScript, which is what's underneath the English, right?

And it builds on an accessor library. Why did I build an accessor library? Let's say there's 20 ways. I can find that element on your next build. Some of those will go away, some of them will change. We will use the entire library to say, I believe this is the element that you still meant. And keep going.

Right? So it self heals the runs as they run on top of it.

Deep: Okay. I think we've covered the, what I think the covered the how. I feel like I have a decent understanding. Let's go to the should, which is always a thornier topic. and I don't mean that necessarily like we have to pick on your application, but I like to bring guests up a level.

But in your case, I think you, you've said some bold claims and let's kind of dig into them. So one of the claims you said is, is if you're successful, there will be no QA team or something, walk me through that a little bit. Is that like kind of a hyperbolic statement or is that, is there a trajectory there?

Yeah, here's what I'd say.

Kevin: When we look at Q QA teams around the world, call it 3 million people, there are, maybe 10% of those are senior strategic, architects, evaluators, et cetera, et cetera. Right? Those are not going anywhere. In fact, we may need more. 'cause as you pointed out,

Deep: because we've pointed out quite a bit of configuration and a quite a bit of thought that you have to get from the people who actually are looking at these applications

Kevin: and, and, and, and those are gonna be the robot overlords like any other ai, the new skillset is how do I train this AI and how do I get the most out of it?

Not how do I sit there and write scripts or manually test. Mm-hmm. So the other 80 to 90% are scripters and manual testers. AI eliminates that, those jobs, period. Full stop. And we, we already can do that today. So this isn't the future. Now, you know, should we, was your question? I will tell you, there's not a ccio slash CFO and sometimes they're locked at the hip a little bit.

And the same is true.

Deep: But let's jump, let's jump out and look at this from a societal standpoint a little bit. in your argument, you just eliminated 90% of jobs in a particular scenario. And in your argument, the 10% that remained were the sole strategic architects that somehow had to get there.

they happened to be the ones who kind of came up through this process of like learning things, figuring it out, et cetera. So what happens in a new world where some kid, just graduates with a computer science degree or whatever needs to enter into the workforce, they're never gonna go through all these steps.

you know, granted they could be a dev or something else, but you know, the similar kind of path is happening on the dev front.

Kevin: It is.

Deep: There's people making kind of really significant claims saying things like every knowledge worker is gonna be gone. It feels a bit extreme to me. It feels to me like with every new tech, we're gonna invent another five new problems.

And I think that's right. What, what are those other five new problems that come out? You know,

Kevin: so look, everyone is seeing already the collapse of the market for new grad coders, the market to higher QA aside the market to higher new coders went from hot two years ago to. Almost dead now.

I mean, we've never seen this kind of shedding of I don't have a need for an entry level person. This is also happening in customer support and call centers, which are being wholesale eliminated across the world. And I think tier one customer support will be completely eliminated in terms of humans over the next two or three years because ai, virtual assistants, Well, you, you mentioned it, right?

I, was the inventor of the darn thing. you know, arguably I kicked that off in the late nineties. And even then we said, if this moves forward in the way we think it will eliminate customer support jobs. 'cause you won't need to do that. They're easy. I can easily train you on these FAQs. and it'll do a better job than the humans.

It just is. However, we are creating new jobs to train AI and coordinate AI and create guardrails and ai, and manage the ai and mostly. The jobs that are going to be around in those fields are gonna be much more around being the robot overlord, because the robot, so to speak, can do a lot of the basic tasks we've been throwing at humans.

Deep: It just is, I mean, the, the challenge I have with this new world we're building, it feels like we're sort of underestimating the role or the need for humans to actually understand what they're over loading. And it feels like if we follow through with this sort of light approach this generation might be okay, we have a whole generation of people that actually know what they're doing and we could just mooch onto the senior staff, but.

It seems like at some point we wind up with a bunch of Harmer Simpsons, regulating the nuclear power industry and, they're clueless. And it seems to me like it's like you have to come up with a way for humans to actually learn and know the things that they're supposedly overlord,

Kevin: maybe that changes our university education. because we don't need kids coming outta university that can, code in Python. What we need is kids that know how to, leverage the AI tools to generate, really thoughtful architectures. Right? now, we have seen this before deep, and I use this example a lot.

If you were in finance in the late eighties and Excel showed up before Excel, you were doing a lot of math and you were doing pencils and literature books and all of that, literally all those jobs went away. Yet we employ more people in finance today than we did then. And they're all strategic. Nobody does math.

You don't need to do math. Math was solved. so we've seen this kind of cycle with math. We just haven't seen it broadly across the knowledge worker space. but we're seeing it now.

Deep: The problem with that argument is that even that of ephemeral, like if I look at the reasoning level of, oh three, let alone, the oh one models, they're already smarter than the vast, vast, vast majority of us.

They already reason better than the vast majority of us. So it just seems like, sure, that's our current. Popular argument that, you know, everyone in AI is using, like, oh, we're gonna have the humans overloading and we probably always will for stuff like medicine. You know, there'll probably be a radiologist like training some models and like inspecting stuff where it's obvious you know, but at some point when the humans are wrong, when the models outperforming the humans consecutively, we're gonna start undoing those things as a society.

it seems to me like we gotta come up with something for humans to do. at least in the physical world, I think it's gonna be a while before a bots are refinishing my basement, but it's not gonna be that long.

You know?

Kevin: It's not gonna be that long. we may be five years away from humanoid robots being in our home, doing some cooking and cleaning it's reason. Yeah. And there's

Deep: a cost reality there. You know, it's 45 grand for, that's right.

Little fog that runs around construction sites. I mean, at that kind of cost point, it, starts shifting things, you know, Moore's law, those costs will drop. But it seems to me, we're seeing a radical retre, like, if your product is, well, first of all, I did wanna ask you like, what's the state of your product today, like,

Kevin: in production, doing millions of tests a day, so, and

Deep: how many, how many like customers and like what is your growth trajectory and all that stuff look like?

Kevin: growth trajectory is good. We don't release the number of customers, but a lot of our customers are very large. They have hundreds or thousands of applications, so Right.

we tend to not work with clients who have two applications, right? We tend to work with companies who are large and can expand. and so that's been much more of a focus because once they have trained some robot overlords, they go, well, I can do 10 applica, then I can do 20, I can do 50, I can do a hundred, I can do a thousand.

Right. Are we seeing

Deep: that though yet, or are you still in the earlier stages?

Kevin: No, we're seeing that, you know, the company's been around, more than a decade and, seven years of that was just r and d. This was a really, really, really heavy lift. It's a couple million lines of code, deep.

This isn't something, although four or five companies tried to do this, this isn't something that two kids outta YC do in a weekend. In fact, they're quite taken aback when they find out how hard it is to get AI to write test cases by itself, write scripts by itself that all run, an applied AI that.

takes more, more knowledge about how applications are built and run than it does about the ai, if that makes sense.

Deep: It does make sense. And we see that in all kinds of, I mean, that's the story across the board with the LLMs. you flirt with the web interface and you're wowed and you cut and paste and, but like getting it all to work, end to end automated takes very hard, takes a lot.

It's like a different order of magnitude of effort.

Kevin: Yep.

Deep: so let's say, you know, you guys are succeeding. There's other companies like you succeeding. That's not a like 10 year story. Like if what you're saying is true and companies internally are going from one or two products to like 10 or 20, or a hundred or a thousand, then that'll probably just sweep across the entire industry within the next two years.

Kevin: I think it will. I think that, probably 90% of the test automation tool vendors in my industry. Are truly AI washing. The rest, have a little bolt on chat kind of thing. Yeah, yeah. no one else is doing, now we have seven patents, but no one else is generating tests automatically for people yet other than us they will.

Deep: what do you think are the, core ethical quandary there? Or have you sort of accepted that this is this massive societal shift? I'm playing a smaller role and like, I'm not gonna worry about it.

Kevin: I think the train left the station when we figured out deep learning in 2012, right?

and Transformers came out of that and LLMs came out of that. And once the train left the station, we were all on it, whether we wanted to be or not. Like there's no choice, right? and so if I don't do this, it's okay. Another company will, but we can't sit here. deep and say, oh, they'll still be manual testers five years from now.

No, no. I mean, whether I solved that or someone else has solved it's solvable with a lot of code and a lot of effort, but it's solvable with the technologies we have today.

Deep: software's a little different, right? Like all of us since the first day we worked in this industry, have been automating ourselves out of jobs since the

Kevin: beginning.

We have

Deep: it is all we do pretty much. and so it makes sense to you and I that this industry adopts things immediately. Grabs it, runs with it, but it still leaves the question, you know, like the suburbs and cities. Of the western world are populated by people who pay their mortgages that have benefited from the last 30 year tech boom.

Kevin: Right?

Deep: what is gonna happen to that? engine. And is it really replaced by something else?

Kevin: let's take a look at a different job. Not in tech, right? insurance claims adjuster of which there are. I dunno, quarter million working in this country or something.

Huge numbers, right? Huge. Yeah. The insurance companies have thousands and thousands of people to do nothing but evaluate what happened to your car, figure out where to take it, figure out if they're gonna pay for it, figure out blah, blah, blah. Right? Already we have the most advanced insurance companies saying, you know, take a picture from five angles with your phone, send it in.

And at first, of course there's human evaluators, but you know what I'm saying here. We don't actually need humans doing that you and I know all the technology. That we need to do that automatically. And literally a CH, the money to you in three minutes already exists, all of it,

Deep: correct?

Correct.

Kevin: And there is no

Deep: question

Kevin: that, that it more

Deep: accurately and less biasedly, I would argue,

Kevin: more accurate, less biased. And some people say, well, well, people will cheat. First of all, people already cheat and that's a legal problem, right? and second, it's more likely the AI will catch that than a human.

and third, you can go to jail for it. So the overall insurance costs come way down. When you take those quarter million people out of that job, those are regular jobs in regular little towns all over America, that will go away. That's not an arguable 'cause if you know AI and you know, deep learning, right?

And, and if someone came to you and you deep, someone comes to you and your team and says, this is what I wanna do, build the system. You already know how to build this.

Deep: Yes, I do. Yes. So do I, but it, it still, it still leaves the question. I mean, I get we're gonna always have a need for the top half a percent of PhD students.

But like what happens to the other majority of people who are just maybe sort of bright, but not super bright? They're like on the Gaussian curve, they're in the center. We don't need their level of reasoning anymore. What is humanity doing?

Bartenders, waitresses, we're gonna need them because nobody wants to get a drink from a robot.

Kevin: That's right.

We need more people helping old people in nursing homes. And, and, and what I think is that every technology change that's come around, displaced lots of people, but gave them new opportunity.

Obviously the farming example is one of the best ones where most of the country worked in farming and now 1% works in farming and produces more food because of automation, right? This is not new. But what's new is every time the automation comes around, people go, what are people gonna do? And somehow we've created more jobs.

But I think the reason we create more jobs is that by leveraging this automation, we are driving down the cost of producing either products, goods, or services. And when we drive down the cost, we drive up the demand, which makes the companies larger, which means they need more people for other things that, machines, really shouldn't be doing, can't be doing, humans don't want them to do, et cetera, et cetera.

Deep: So what do you think those other things are?

Kevin: It's, well, there's obviously the robot over loading, right? I think there's going to be people in sales. There's going to be people, a lot of marketing is gonna be done automated. This is a great talk about marketing, right? We still have people deciding what to do and how to do it.

But very soon, you'll be able to go to Facebook and say, here are my usps. Create my ads, change my ads every hour. If you need to do whatever you need to do within this set of rules and generate them automatically with ai, and they're cutting out the advertising agency completely.

So you have a WPP that's a multi-billion dollar advertising agency. Why do I need them? AI can not only create better ads, it'll change them every hour. You, you couldn't possibly keep up. So there are some of these things that go away at the same time, by driving down the cost of marketing, I'm able to drive down the cost of goods and services, which gets it to more people, which means there's higher demand.

I've got a bigger company, so I need more people. In some of the other roles, sales, marketing, some of the HR roles for are certainly not gonna go away. There's executive roles. You need more company's twice the size. You might end up with twice as many executives, but doers like sitting in, it's gonna be less.

we've been automating car factories since 1961, first Robot GM 1961. here we are, six decades later. And GM employs more people than they've ever employed. They also employ more robots than they've ever employed, the robots have driven the cost of getting a vehicle down to something affordable for middle America.

And without that, a car would be 10 times the cost. Right? So this automation was required to drive up the demand, which increased jobs.

Deep: I think my optimistic side agrees with you. I think my, optimistic side also believes that humans have an insatiable. Ability to reinvent what they care about.

you know, like 40 years ago, if you wanted, like, let's take soap. 40 years ago you wanted soap, you went, you bought a dove bar of soap in my neighborhood in Seattle. This sounds bizarre, but soap is probably like the ultimate status indicator. You walk into somebody's bathroom and if they have dove, you're like, oh, they must be poor.

Or a college student, but otherwise, you know, it'll be some fancy artisanal haw. Right, of course. In a bottle and it comes out. Yeah, maybe in a bottle, maybe like not my wife, she wants everything. Like it's, it's like in this fancy organic wrap packaging, it comes from Yes, I know all this stuff.

Right? So humans have this like ability to redefine, like, so the whole artisanal movement, you know, people don't

Kevin: employed employs millions of people that had no employment a few years ago. Yeah.

Deep: Yeah. So I think like, all of, that's right. Also, there's certain things that are boundless, right?

Like the ability to just get humans to live forever, you know, is an incremental step. But, is step by step process. But people will put whatever they need to in healthcare to improve health. Mm-hmm. So If you saved them. 200 bucks a month on their car insurance, you know, and they can throw it into some alternative wellness.

That's right. Therapy. They'll probably do it. They'll probably take the yoga class. They'll probably

Kevin: that, that, that's right. And, doctor's offices are certainly gonna be augmented by AI and they should be. They are. 'cause the AI is, is is far better than a doctor being up to date. Like these are obvious things.

I think we'll see how it pans out, but I'll say this in my field is if you're watching this and you're, or listening and you are in the software QA field, you wanna learn the technologies that App Advance has or if you can find it elsewhere, because you wanna be the robot overlord, there will be people that will run that operation and they will be experts on the ai.

And those that aren't are certainly eliminated. Right? Yeah. that's the bifurcation. And I see both and I see people today across not just our industry in qa, but others sabotaging. You brought this up right up front. How do the workers feel?

Deep: Oh, it's, I see this actually 'cause you know, we're, we're sabotaging

Kevin: it

Deep: help folks.

And a lot of times projects are positioned in a way that the people who are gonna be affected by the, the AI are, you know, they're not stoked about it. And I get it. And that's why, that's exactly why you brought up the co-piloting approach. That's why the co-piloting approach is taking off is specifically because you're kind of getting people on board and it's like a Faustian bargain on some level where we all accept, I mean, you know, we're all in this, it's not like I don't use LLMs.

I use 'em probably, I don't know, 400 times a day or something. Me

Kevin: too,

Deep: me too. Of course. So, so, I mean, I, I do wanna push a little bit on the potential negative future because I think there's a lot of things that policy makers and others must address. And I think largely with the internet, they kind of.

Failed, to like, think things ahead 30 years ago when this stuff started evolving. Like the legal world didn't catch up even till now they're still catching up.

Kevin: That's right.

Deep: I mean, I think, there's gonna be a lot of pain in the very, very near future. I mean, I we really run the risk of a deep, deep recession, where, people that have maybe been out of work for at most six months or nine months in the past are looking at 2, 4, 5, 10 years, you know, and, and like career changes and shifts.

that seems inevitable to me. I don't see how, uh, a mid-tier not so great at their job, but not horrible at it. Person keeps in that career path in this world. Well, you gotta in the top

Kevin: deep if you, I, I, if you and your company have a low end or mid-tier person that isn't really great at what they do, are you really keeping them anyway?

Deep: We're, we're too small to get them in the first place. Like we only have, we only have great people because I don't have that. Right, right. But

Kevin: you, yeah, I mean you see my point is these are people who have always been on the edge and they probably gone job to job raised that bar.

Deep: My point's, not that we haven't always had them, it's that we raised that bar

Kevin: a

Deep: lot.

Kevin: Right. That's fair.

Deep: That bar a lot. Right. And what are those people going to do? Have a hard time getting a CS job? Probably not, but like I'm seeing kids, you know, with a master's degree in CS in AI, from great universities, looking for months and months, you know? And I'm like,

Kevin: wow. So, so the answer is then, you know, there's eight, 8 million jobs open right now, give or take in the US and you go, where are they?

Well, let's start with nursing tons. Yeah, let's start with plumbing electrician, HVAC. taking care of old people in nursing homes. I know a lot of people, I don't want to do that. And I'm not trying to say that you should or shouldn't. I'm trying to say. there are going to be openings all over the place and they're not necessarily where they used to be.

Deep: Yeah, I think that's right. And I think we actually have an opportunity to like, push humanity into fields that have like higher human touch and humor and impact. This has been an awesome episode. Thanks so much for coming on the show.

Yeah. Really enjoyed having you, so

Kevin: thank you.