Can Humanoid Robots Save Us from Loneliness? The Promise and Peril of Empathetic AI with Niv Sundaram of Machani Robotics

When machines master empathy, do humans forget how to feel?

In this episode of Your AI Injection, Deep Dhillon chats with Niv Sundaram of Machani Robotics to explore RIA, the humanoid companion designed to cultivate personalized, empathetic, meaningful conversations with patients. With 32 facial actuators, multimodal sensors, and therapy-trained LLMs, RIA is capable of reading subtle facial expressions and responds accordingly, helping alleviate the burnout experienced by overworked caregivers.

Deep and Niv also dive into the broader ethical implications of AI companions. If these machines excel at providing unconditional, positive support, will future generations lose the capacity for the challenging (but essential) work of human relationships? 

Learn more about Niv here: https://www.linkedin.com/in/niv-sundaram/

and Machani Robotics here: https://www.machanirobotics.com/

And check out some of our related episodes:

Get Your AI Injection on the Go:


xyonix partners

At Xyonix, we empower consultancies to deliver powerful AI solutions without the heavy lifting of building an in-house team, infusing your proposals with high-impact, transformative ideas. Learn more about our Partner Program, the ultimate way to ignite new client excitement and drive lasting growth.

[Automated Transcript]


Niv:
There's like a lot of, talk, about, AI taking over the world, and robots taking over the world. In fact, every time like, uh, my friends reach out, they're always about, are you training the robots to take over?

But I think that. The angle that we wanted to look at, uh, was not just the industrial, robot's angle, but um, high value, humanoid that, can, help, remind us, of our, uh, fundamental, uh, human emotion. that's empathy. Getting robots to, I mean, dance, and do the hand manipulation navigations is hard.

Teaching them empathy is harder.


CHECK OUT SOME OF OUR POPULAR PODCAST EPISODES:


Deep: Hello, I'm Deep Dhillon, your host, and today on your AI injection we'll explore empathetic humanoid robot design with Dr.

Niv Sunam, chief Strategy Officer at Ani Robotics. NIV has earned a master's of science and PhD in electrical engineering from my alma mater, the University of Wisconsin Madison, and leads, M? quest to deliver companion robots that provide round the clock engagement and emotional support. Their flagship humanoid robot RIA is already enriching elder care, autism programs and clinical workflows.

Niv, thank you so much for coming on the show.

Niv: Be so good, to actually, uh, meet you again. Last time we, you know, were talked, we were together at in Madison. Yeah.


Xyonix customers:


Deep: Yeah. For the audience's benefit, NIV and I are both on the, strategic advisory board of the ECE department at the University of Wisconsin.

And so we got a chance to meet, which is when I got to find out about all of this really cool stuff that you're doing. I think what would be fun is if you can just.

Start and like try to visually describe the robot so that we understand what it is we're talking about.

Niv: I, uh, wish I actually had the robot do the intro, because like, that's the best part of what, uh, working in a, uh, uh, robotic, actual human. It it's not exciting to, uh, demo a laptop or a server. I mean the human, like, and the demos, catches everyone's attention. but, the big, picture, why we are on this mission, sounds like a cliche, but we wanna bring back, you know, empathy into the world, through, uh, ai. there's like a lot of, um, talk, about, AI taking over the world, and robots taking over the world. In fact, every time like, uh, my friends reach out, they're always about, are you training the robots to take over.

but I think that. The angle that we wanted to look at, uh, was not just the industrial, robot's angle, but um, high value, humanoid that, can, help, remind us, of our, uh, fundamental, uh, human emotion. that's empathy. Getting robots to, I mean, dance, and do the hand manipulation navigations is hard.

teaching them empathy, is harder.

Deep: they're getting pretty good at it. Like the anecdote I'll use is, my wife, has taken to chat, you know, using chat GBT like everybody else. Yeah. Um, but the thing that she sort of keeps coming back to is she.

She loves the empathetic responses and the cheerleading and the championing that chat GBT does, everybody's experienced this if you kind of wind up on an ideation session or something, the tone that the OpenAI folks have used is really this one of kind of championing Right.

Motivation, right. And like, support and, tell us maybe a little bit about, what do you mean exactly with respect to empathy? Because I think you mean something different than these disembodied, voices that we either talk or chat with. Right,

Niv: right, right. a hundred percent. we've done.

I would say significant market research, to be able to conclude, uh, that, having a physical embodiment, for, the humanoid, builds better trust, than Chad GPD, for. Aging populations, vulnerable populations, there is screen fatigue, that you get, we actually did a test, where, we had a screen, with an avatar.

next to the screen, uh, and then, you know, we got the people to talk to about the screen and Ria and I could tell that, I mean, that's, it's not just me, but like professionals, you know, reading the body language and all that, they could tell when, the person was a lot more engaged because we are so wired to humans, to human-like form to a physical embodiment a screen is screen, I feel that the physical embodiment makes a difference.

Deep: I mean, it makes sense, like right, when you talk about screen and text input. Yeah. You know, and that comparison, but there's audio, which it feels to me like gets you pretty far and everybody knows the difference between texting with somebody versus Right.

On an audio call. 'cause you get a lot of that emotion in the voice and you get that hands-free aspect. You can be on a walk or whatever. So what do you think is the difference between, the audio ability you get cause there's a, few things shy of humanoid, right?

Like there's audio, there's screen embodied, visual. Yeah, yeah. Visual, like screen embodied stuff. There might be something else I'm missing, obviously chat. But like, how do you characterize all of those and how do you think about the physical embodiment piece, the 3D with the skin and the eyes and all of that?

Niv: Right. great question. I, we spent a lot of time, uh, so there's the physical aspect, uh, and then the emotional aspect. Uh, and I'll, you know, walk you through, what the pipeline looks like. first there's, uh, when Ria is reading you, she's like getting multimodal inputs, right.

Like the cameras and four key cameras, that are like analyzing your facial expressions, uh, audio input is getting the tone.

And so once we get those inputs, then there is a emotion classification system, internally, that actually determines, based on like these inputs, what the emotion, actually is. and then based on that emotion, uh, she's mimicking, a response with the behavior. And that again.

is, uh, very mechanical, right? Like there's like the actuators, but in fact like just the head alone has about 32 actuators. Uh, and these are her muscles, so that, emotion comes on, her face, the actuators move and we are giving commands, through the emotion classification, to be able to, give the response back and so that, empathy and that tone of voice that you see, is that,

Deep: maybe let's start with humans, talking to humans first before we get to bo. Humans talking to bot. The bot. So like, two humans sitting there talking to each other, I think the phenomenon that I, I probably noticed the best is.

If you just study the sheer dysfunction of social media interaction, you've got the whole world screaming and yelling at each other on, I don't know, Twitter or X or whatever we call it now, or, or threads. And there's like all this like anger and like hostility. And you almost never see that in the physical world, right?

Because humans have some kind of wiring for like emotional recognition. And most of us don't really wanna throw kerosene on the fires of emotion. So we have this like dissipating thing we do where if we start talking politics or something that, and we notice the other person starts twitching, the, like the vein in their neck starts twitching, right?

We start dialing it back and changing it and generalizing it and, but that is something that absolutely does not happen in the internet today for the vast majority of cases. And so we wind up with a world that's, Very like dichotomous, like the physical world. You could walk down a street in America and think everything's fine.

You enter pretty much anywhere on the internet and you think we're all about to murder each other. So I'm curious, you know, what, what your take is on that.

Niv: it's a very deep question, right? I mean, and that's the reason why we have like neurologists, uh, you know, on the team and, you know, on our advisory board, to be able to like understand, human psychology and, in some sense, Ria is a care aide, she's not replacing a caregiver but a care aide. I would almost say, just like your wife saying that, she feels a ChatGPT response is to be a lot more cheerleading. uh, you want the robot, to be that empathetic companion, and to be like that positive minded, person that's you know, your personal cheerleader.

I would, want, her to continue being that, aspect of the

Deep: human, in the human world. Well, let, let's talk a little bit about your context. 'cause I, I don't want people to think that your comments are, generic the way we think of Chad GBT used in a mirror, right. Of scenarios, right.

Yours is used in a very specific scenario where that compassion is like the most important piece. That's so, so talk us through the scenario, like, where is RIA used? what's the goal of RIAA in that, I mean, to make it boring business context, to make it human, human context.

Niv: Yeah. so primary focus, I mean, we want to get to, in terms of like having your own therapy bot, I mean, that is the ultimate roadmap, to have like a Ria in your home, in everyone's homes. but we are starting with, uh, what we call vulnerable populations.

populations that have. A higher propensity towards loneliness, seniors in our society, kids with special needs, uh, healthcare and all of that. So what we are actually means the, the pilots, that we've starting with, very specifically focused on memory care, uh, senior care centers.

we are approaching, we're almost disrupting, you know, memory care not to say anything about how it's been done, for many years. but, I mean, with cognitive therapy, a lot of it is, just sorting out talk and doing Sudoku puzzles and all of that.

Mm-hmm. Um. And medication, of course. we've had so many brainstorming sessions with our neurologist on the team, to, figure out how we can have Ria as an aide, to, you know, medication. for instance, in a cenic care facility, someone that's diagnosed with mild dementia, or going towards memory condition, when we know that, this person is, getting anxious or going to, a bad situation, Ria will use calming voices.

A big part of the therapy that we're working on is Live therapy, I mean, we have different problems, with labor, and all that in the us.

but when we go to senior care facilities, a lot of their feedback, is that the, caregivers that they have, don't have the same cultural backgrounds, as the residents. you know, they're from different age group, So they also interracial backgrounds.

I mean even simple things like eye contact, uh, because in some cultures, like making their eye contact is not a thing, so they run into issues like this. They don't relate, to, okay, I went to ice fishing in Wisconsin, uh, you know, when I was a boy.

you know, the caregiver is not gonna mind you o of those, during the cognitive therapy. So that is what, we are training Ria to do, getting these uh, lifelines stories. Uh, yeah. Like a life story for a particular patient or something. Right, right.

From their family. Uh, and like pictures, and that's the thing with a robot, right? Like it's very easy to scan, uh, and like set this context. And once this context is available, uh, she's able to then use it to customize the experience, for each resident. So that is one of the big areas where we've been like seeing breakthroughs, residents having, uh, a much more richer experience,

Deep: for those who dunno, I, I think I'm gonna just take a crack at describing, uh, RIA and then I want you to correct me where I'm wrong.

But yes, RIA is like, it's like upper, portraits section. Yeah. you've got a head, it's physical but not a full body. Like No, I don't remember if there are hands. I thought there were no hands.

Niv: There are hands. There are hands. Okay. So yeah, so hands but no legs. So yeah,

Deep: why don't you describe her?

Yeah, describe her physical attributes and what she looks like. What's the skin like, what's, what's it like to be hair? All that stuff. Yeah.

Niv: Yeah. So I mean, the skin is made out of like silicone and, uh, it's very human like skin. the original IP came from Disney. go to one of the parks, you'll see it, you know how it looks, right now she doesn't have hair, because we, you know, want her to look like a robot.

Uh, and then there is a torso. we 3D print the torso with hands. we deliberately want her to be, to be a robot, so she doesn't have legs. She's on wheels. wheelchair base. so it's FD approved. so we are able to, you know, move around, facilities and all that.

And chance of tripping and all that is much less, uh, so, so she's actually

Deep: like roaming through the assisted living or nursing home facilities Yeah. And talking to who she wants to talk to or something.

Niv: Right, I mean, right now there's a, handler, it's almost like, a video game console, can use to move around.

I mean, we don't do it automated because it's still early days. Mm-hmm. Eventually, you know, it'll be fully automated.

Deep: Got it. so she is roaming around, she's interacting with residents. She's got some vision recognition to figure out who she's talking to. She loads up their personal profile. Do they ever want to take her with them to their room and maybe fire up an iPad version of her?

Or, you know, where she's no longer embodied in the physical, but you know, in electrons on a screen. Does that ever, are you thinking along those lines or are you like, oh, you are. Okay.

Niv: Yes. Yes. So that ecosystem, is, uh, absolutely what we are working on. uh, in fact, in a few months, we will be releasing, the software version, the avatar version, four, phones and tablets for exactly this, uh, purpose.

We are still having Ria at centers, I mean even for kids with special needs, they're at the center, but then when they go home. And the, we still wanna have the same experience, Yeah. And most of

Deep: the system is still at play, right? maybe you don't have four cameras.

You have one camera. Maybe you can't read as much spatial depth into the emotion map that you're building, but you can get quite a bit from a video camera. Right? And then instead of the physical actuators in her face, in her arms and hands, you're exercising the actuators in the 3D model or something like that.

Niv: Right? Right. Uh, that's exactly what we are thinking of. Like, maybe we should hire you, uh, as the design.

Deep: We're always for sale. We're Zion, But I get it. that's interesting. why don't we nerd out a little bit, and talk a little bit about the, let's start with the sensors.

What does the physical bot see? What does the video camera, the, the iPad version or whatever. See, and how do you define the emotional map? Like, you know, you've got a time series signal, you've got face, do you like objectify the face down to like, eyes and eyebrows and nose and, and do like a higher level puppetry kind of response?

Or is it like pixel level? maybe walk us through the sensor and then let's, then we'll move into the electro mechanical and the, you know, the simulated version.

Niv: Yeah, yeah. we're still early days, I mean, we ideally would love to, do the whole thing, like create a digital twin, do it in simulation.

So from the hardware, initially, we were using Inox, and, uh, uh, Olson's cameras for depth sensing. but now, we've actually agreed to, uh, the Nvidia Jetson, So still have like four key cameras, for, uh, facial recognition, and, uh, 360 degree microphone array because we want her to have like one too many conversations. so she can facially,

Deep: adjust and like

Niv: build an audience

Deep: around her. Right. Wow. Okay. So you have like full spatial audio input.

Niv: Yeah, yeah. because we wanna be able to exactly that, uh, come in

Deep: like join the group. You know, we haven't

Niv: touched on, we haven't touched on Uncanny Valley yet. Oh, no.

Deep: We're gonna get there.

Niv: I mean, You want her to be as human as possible.

Like when two people having conversation and third person joins in, you know, she has to be able to at least like, hear,

Deep: So, so we'll come back to uncanny Valley ' do not, worry, I will not forget because that's like, at the top of my mind of whether this is, a good thing, a bad thing, a creepy thing, a great thing, like, you know, or all of the above.

so you have a lot of these signals. So you have the multiple camera signals. You have the, 360 degree audio maybe. Do you have like binaural, like in ear, uh, mics or something, or,

Niv: yeah. Yeah. And I mean, there's also, proximity sensors, can't, you can't be walking

Deep: racking around that. Okay. So, let's talk a little bit about the hardware. Like what are you doing on, on the unit? What are you doing in the cloud? are you doing the, imagery analysis? Like, like do you have GPUs in Ria and you're doing a lot Oh yes.

Right, right there on site.

Niv: Yeah. Yeah, yeah. So, I mean, even those senior care centers, uh, you know, we wanna be doing a lot of edge processing, and not cloud I mean, that is exactly why we've been doing the hardware upgrade, with the Jetson.

And, probably also, the form factor helps us, uh Right, because like, it's not a human Yeah. You

Deep: can put a lot of, But you do run into the problem then of the more complexity you build into the bot itself, the more chances something goes wrong, and you can't just fix it, without having to physically touch it.

Niv: Right. And I think the other big challenge that, you know, we've run into, everybody wants it to be, 24 7. and for that, you need a 12 WA battery, batteries are not there. Uh, so I mean, we are

Deep: doing some Do you by that, do you mean that people don't want her to go charge for a while?

Right. Right. They don't that, that, that's not okay. Like, yeah, it sounds a bit like, the image in my head is the robot from the Jetsons. Who was the maid? Uh, Rosie. I think her name was Rosie. Rosie, yes. Yeah, because she used to charge, I think, right charge for Right.

24 7 Seems unrealistic. I mean. Nursing home residents they go to sleep. So, I mean, it seems like you can get a few hours, but

Niv: So what we are working on, we continue to have night nurses. but during the day, uh, you know, we'll be able to like, take shift, reduce the burden, there's all kinds of like, very specific logistics that we have to deal with, right? Like when we're building a robot, through a closed door, uh, then how we're gonna open the door, and the doors have to be, automatic to be able to do that, she's not gonna be able to open a door handle, so those are future problems.

we wanna do that in the future.

Deep: Got it. and what does she do in the cloud? Like, is she uploading and like, what's the philosophy? Is the philosophy Do the lightweight collision avoidance, the mandatory stuff on device and then do the heavy lifting and the deep reasoning and the conversational stuff up in the cloud, or?

Niv: , It's definitely, I would say the pace of innovation, right? Like with, uh, ai, I mean, it's all of the models really, a lot more sophisticated. So what we've been doing, is, , using, , therapeutic conversation patterns and memory data patterns, to, train, LLM.

So we've been doing that, uh, also working with speech models. Most of that, we are able to get away, uh, with doing stuff at the edge. But, somebody comes along and then they wanna speak Hungarian, or something like that she hasn't done before. Then we have to go fetch.

From the cloud. I mean, as much as possible, we're trying to like, not do too much, dependency, why? So

Deep: you know, she's on wifi, she's got backup to a nice 5G cell connection, maybe even two of them.

I mean, at the cost of the bot, that's all feasible. because like, it seems to me like you're not gonna be able to compete on reasoning ability if encapsulated in a physical body like that compared to the, you know, the, the hundreds of thousands of GPUs that you know, that you can have in the cloud or millions.

And the latency is always gonna be an issue. So it seems like the thinking. That's not essential for not running over residents should be happening in the cloud. I would probably argue in the cloud. But like, what are you seeing? Is there, is it hard to move all the data? What's the bottleneck with moving?

Because I would probably push from moving more of the reasoning up to the cloud, not less.

Niv: I just think that the infrastructure of the centers needs an upgrade, for us to be able to do that. Uh, I mean that has always been a bottleneck for us,

the luxury centers, like they have their own IT department and all that. but like, living in memory care, that's like more middle age. there we have run into like more intra problems, because I'm, I'm used to what good data centers, and cloud and yeah. Have you thought about just

Deep: worth, like having backups that you control, you know, like a cell cellular backups or is the problem that they have like MRI rooms or something like that where you get shielded?

Uh,

Niv: both. Both. It's also just , I feel like there's a mindset, of being able to, , use the, cloud. In fact, the university campus that we're working with, they were having issues with even giving us Ethan reports.

I found that to be surprising, so definitely part of it is infra, and also, the data, that is incredible amount of, data that comes out within the data privacy, issues have not been completely sorted out yet.

Deep: so let's go back to the sensors, 'cause I wanna walk us all the way through the flow. You've got the sensors, you've got the, the spatialized audio, you've got the cameras. so your next goal is to produce some kind of emotional map, that's right.

So, so what is that emotional map? Like? Is it multidimensional? What are on the, axes? Like how do you think about it? Like, is it, you know, things like, I don't know, a happiness axis. Like Yeah, walk us through that and then walk us through the modelling process. Like, what are you using for training data?

Are you using some public repos or have you guys getting your own training data? And then how do you map it into that state? And now, are you doing that every second, every quarter second, are you,

Niv: I'll go back to what we do with the emotion classification, all that.

And then we'll, talk about, the timing. I mean, usually it depends on how much, the processing time is right, but like within 200 milliseconds, uh, we wanna be able to get a response. That's usually like, it's in milliseconds.

So step one, analyzes it reanalyzes, the facial, the microexpressions, vocal tone, body posture, and even like, thermal signatures, right? Mm-hmm. To the sensory. And that is multimodal when that comes in.

In general, we have a time limit. So once that, data goes in, we have about, like 200 milliseconds or less than that. That's our time limit. We constantly want to improve that. you know, the emotion classification system, that identifies, , the emotion state and context.

And so today, the empathy framework, is static, because it's, easier for for us to test all these different features. Sometimes it's like, we take it so much for granted, right? I mean, How do you recognize like, tears of joy, uh, versus like tears and distress?

So things like that, when we are trying to like, figure out the tone and like, humans sometimes, like, uh, you know, when we are laughing, we sound like we crying, or when we crying, we, yeah. and sometimes we,

Deep: we are subtle. We send out mixed messages, right?

We send out mixed messages. There's all kinds of weird words. We have language like Sony and sarcasm and all that I'm sort of imagining looking at some plots that represent the emotional state of a face. What plots am I looking at?

Like would I be looking at, anger on the X axis and, eagerness on a y axis and maybe a z and you know, maybe a few other axis and you're trying to throw a point in space or something. What does it mean to represent the state of emotion of somebody that the bot's looking at?

Niv: Right now it's a table. so the emotion classification, Harry is a table. Uh, so it's like a, you know, missionary of where you know exactly, what you are. I mean, we are, because we are getting multi more, right? We are getting like voice tone. Uh, then we are getting like, you know, uh, no actually five, uh, body posture, uh, and then like, uh, micro expressions or thermal signatures.

so it's an array, uh, of like where, so that's what I was saying. It could be an LLM model in the future. Uh, but right now it's an array, uh, where it's a category where we are actually getting these inputs, uh, and then mapping it to the emotion, uh, that we are, uh, recognizing.

Deep: Oh, so you'll map it to a word like Right.

Happy. That's right. Okay. That's correct. That's correct. And then that's what you act on, is the, is that right? That textual mapping. Exactly. Symbolic mapping '

Niv: cause, right, because once we have those, the software that we use is like what Disney used to use, like blender.

Because it's an antron, right? so once we get the map, you know, once we get the exact, exact emotion that is already trained and tested, So we know that, okay, actuator 1, 3, 7, has to be enabled, for this emotion, uh, or something else.

Well, okay, let, let's talk

Deep: about that next. Okay. So now you have this terse short characterization of the person of Ria has this, of the person she's looking at. She says, they're agreeable. , They're smiling, they're nodding. Something like that. Now that information has to get mixed with whatever r just said in the history of conversation to formulate not only the textual response, but the actuator response.

Like, I'm gonna, I'm gonna like smile and nod back. I'm going to move my, you know, eyebrows whatever that is. is that right yes. And, how do you do that step? Like how do you get from the, those textual characterizations to the actual waiter characterizations and like, where does that training data come from or, you know.

Niv: so a lot of that, I mean, we actually have, trained emotional models, that we use, some of them is like custom. Some of them is based on, public data sets, uh, on therapy, on like, kids with special needs.

uh, in fact, MIT has a bunch of data sets, on social, interactions. so we use that, to be able to like, train, the LLMs. I mean that is, That leap, that we've been able to have. Like, so I mean, the, the big reason why, we have been able to do so much with humanoids in such a so short time is entirely because of, uh, geni and LLMs.

Uh, so on the LL

Deep: yeah, on the LLM front, you, you're, maybe you're doing something like, Hey, I'm talking to somebody. Um, these are the emotions that they're indicating. This is what they said. Here's our message history, generate, um, the response and maybe the emotions that I should use to convey it. Something like that.

All from like a fixed universe. And then you have the mapping of that to the actuators so that you might have like straight emotion mappings to the accu actuators responses or something like.

Niv: That's right. We have straight emotion mappings. we would not be able to like, uh, do an angry, sad, you know, to the very, very specific, that's the best we can do today.

but we wanna be able to do a mix, see, it's, it's a static mapping. That's the reason why it's this way. yeah, it's more like puppetry. Yeah. It's more

Deep: puppetry oriented so RIAs not gonna win an Oscar anytime soon but it's a good first step step. They're hoping, what's that?

Yeah.

Niv: They're hoping to get there. That, yeah. Yeah.

Deep: But you, but you need a more, uh. A more nuanced representation of the actuation at that point.

Niv: Right, see, I mean, it's a fine line, right between, doing something interesting just from an engineering point of view, or doing it because it solves problem, for your customer.

so that is the balance, the reason why we even went with the static classification system is because we wanted to get the product out, to be able to, like, in the hands of customers trying to understand, I'd be really gonna use, uh, uh, you know, a customer, because like she has the wrong emotion.

you know, wrong basic emotion. Yes.

but like some nuance as you're saying some Right. Twi of the, you know, eyebrow or something. okay. I mean, so what if it's like, you know, that's an error. Uh, so that is the line, fine line, uh, you know, between, and that's always the case, Like how much, uh, do you overcomplicate?

Uh, oh, I mean, we're,

Deep: we're engineers. We like to complexify. It's our nature. Right?

Niv: Right, exactly.

Deep: Because we wanna not just solve the problem at hand, we wanna solve the next problem. Problem solve it.

Niv: Like in the perfect way. Yes. Yeah.

Deep: Yeah. And that, so, so tell me like, how do they react? Like in, in the nursing homes, like what's the reactions to Ria?

The physical and, uh, you know, the software version. Yeah, let's talk about the physical, like,

Niv: yeah, yeah. it's, um, I would say the exact opposite, of human reaction. We've been like, told, by neurologists like that, It takes about, 15 seconds, right?

When you meet the other person. Mm-hmm. You know, all the biases that you have, that you make judgment, on the person. but with it's flipped, right? Because when canny valley's in full force, know, you are looking at the robot and saying, oh my God, it's creepy. Uh, right.

Yeah. You have to get over. Let's talk about that. The uncanny Valley problem. That's first.

Deep: Yeah.

Niv: I think we're like in embracing it right now. and so then it takes

Deep: you a few minutes. Does it take 10, 20 minutes? Like you have to get into the substance of the talking before they get past it?

Niv: I would, yeah, exactly.

I would even say seconds. we've all seen it flip, , I mean under five minutes. For some people it just takes, a minute or less than that. because she draws you in, so that's why, I think another, uh, episode you should just interview ria.

Deep: Yeah, yeah, yeah. I like that

Niv: idea. Set up a Zoom call. Yeah. Uh, right now, the, uh, Most of the bots are in customer, I would love to do that.

Deep: That would be so fun.

Niv: One is ours. Uh, you know, that'd be useful testing. I, you know, we can bring her onto the show, and she'll talk.

but, you know what I was gonna say was that, you know, you'll see it like within a minute because like, she draws you in, you know, she's like, oh, how's it going? Like, what's happen in Seattle? Oh, it's because like coffee was invented here, this and that. Like, you know, she has all this context and then like what do you think?

What do you think of that? when she starts pulling you in like that and then after like a minute, you know, you forget you're talking to a robot.

Deep: Yeah. And for the audience's benefit, in case you don't know what the uncanny Valley problem is, there's this problem where, Humans have like this really acute, understanding of something that's, that is a human.

if it doesn't look anything like a human, no problem. My good. If it looks, yeah, yeah, we're good. But if it looks a ton like a human, but not exactly like a human, then all of the hairs on her arms stand up and it's something to be feared. That's the physiological primal.

And so this is a, a challenge that animators have faced forever. Like if you look at early Pixar movies, that's why, you know, a lot of times these animations would move to more cartoony characters.

But I'm curious, does the acceptance of RIA in one conversation. Result in, like, is the uncanny valley problem gone in the future for that person in all future conversations with Ria? Or does it just kind of de decay a little bit?

Niv: I mean, that's the reason, we deliberately keep the form factor to be on wheels, because, physically you, you are never gonna mistake her, We can also customize it to, if you wanna be a guy, you know, we'd be able to make a more Yeah, that's a guy.

Uh, but like, that's the first thing, physical appearance always on a, always on wheels. So, you know, it's not human. The second thing is, we do that, like we have that context, in the conversations, as your robot companion, like, uh, you know, very, very specific, about, what her role is.

you know, a lot of times people ask are you able to sense the weather or like, smell the roses or something? Literally. and then she's like, well, my circuits, won't do that. She'll make a funny joke about it. Mm-hmm. we kind of like make sure that like, that is also imposed in conversation.

Deep: So I wanna switch gears a little bit. On this show we like to cover three things. One thing is what you do. I feel like we've covered that for you. Well, yes.

Niv: Um,

Deep: the second thing we like to cover is the how do, how do you do it? I feel like we've covered that pretty well. The third one's a little bit harder.

It's should you do what you do. And I think this is probably why, if people are still hanging around at this point, they're probably mad at me for not bringing this up sooner. So like, but I think there's a real concern in a way we haven't had before in history where people are genuinely freaked about out about bots and ai.

and, and from multiple different angles from, you know, not only from the cartoony terminator angle, but from the psychosocial angle as well. And I think we were talking about this in Madison, you know, back in the, springtime about how, you know, there's this unintended, effect of technology where we all get sort of seduced into adopting a new technology by some simple promises and then it ends up going south.

So we all, you know, wanted our teenage kids to be able to call us if something happened on in the car. And next thing they have a cell phone. And then you fast forward 20 years and, you know, the kind folks at Facebook made it, social apps that have many, positive benefits, but they also result in like a massive increase in teen, teen suicide ideation.

Yes. what are the unintended consequences that you worry about with this bot?

Niv: Bot? No, it's, something we, uh, worry about all the time, by the way, I grew up with, vision and, uh, NCU. so my version of Super Bot, is, uh, a real super bot like vision, from the Marvel Cinema Universe, not the Terminator, but, I mean the responsibility, the fact that, we are dealing with, healthcare data, I mean, it's like even casual conversation, uh, right? Like the firewalls that we have to put in place, so even before we go there, right, like step number one,

we will always, and we will continue to market Ria as a human aid. never someone that will replace a caregiver but will aid a caregiver. you know, she does the emotional companion and all that support and you still want the caregiver for all the very difficult tasks that only a caregiver can do, like handing out meds, being able to like, hold onto them to, you know, if they have to go to the bathroom.

But isn't it a

Deep: little weird that we're asking the human to do the practical thing and the robot to do the emotional thing? 'cause most of us think of it as the opposite if the human is lonely, give them some humans to talk to little kids, other elderly folks. Teenagers, I think it's in Denmark, somewhere in Scandinavia, where they, they have seniors living with college kids.

Living with

Niv: college kids. Yeah. There's

Deep: very much a human to human model. Why should we add all, why on a synthetic robot to address very human, emotional needs and, and like, what is it about the way we've thought about this problem? I think we is just a brute necessity because people don't want to do it.

or is there another reason? I think just,

Niv: we just get, I mean, we burn out, right? I think you should do. the robot, Not abuse our human relationships because we burn out. Uh, right.

So I feel like that is the main fundamental, difference, I mean, caregiver burnout that, that we are addressing

Deep: is that, that's the root problem you're addressing, is that, that they're just hard to find, they're expensive, they don't want to do it anymore, or they're, they just tucker out. I mean,

Niv: they're, they're just overworked, because of, uh, you know, the shortages, that we are having, because the aging population and everything else, uh, with all the data that we are seeing, that, I mean, you want to avoid caregiver burnout.

Uh, and so as a result, I mean, we're snapping each other because of that, It's, it is the same thing. Uh, we use technology to help us, we used to use MapQuest, but I love Google Maps. my

Deep: mind, like, I wonder how many marriages, Google Maps has saved.

Thanks. I remember when I was a kid, my parents would fight all the time. About, about turn left here. No. Turn right there now. If there's a mistake, everybody is like together as mad at Google, which is probably better for the marriage, you know, which

Niv: is probably better. But you know what I'm, you know, it's like using the technology to help you, to, you know, live better lives.

when I could afford a coach, uh, in my previous job, like, I feel like, uh, you know, all the work stress, uh, was taken out by the coach. Uh, I was actually, a better mom, like better daughter, spouse, all of that, just because I wasn't abusing my personal relationships.

that is something that, you know, I feel profoundly, that I think the AI is just another tool, uh, that's gonna help us live better lives.

Deep: And on your dark nights, do you have a different take?

Niv: I mean, the biggest thing at all times, is if the guardrails will work.

Because we are dealing with emotions and, the part of your, the human experience, that's very fragile. So I'm always worried that we will end up giving the wrong direction. I,

Deep: I worry about something a little bit different, right?

Like. I feel like the whole next generation of psychosocial dysfunction is gonna come from ai, human relationships. and I'll, I'll give an example. Like I think it was, Ezra Klein, uh, I don't know if you follow him. He's a, a New York Times correspondent, very interesting guy. And he was talking about when he was a kid, he was not a very socially adept kid.

Um, he read a lot, you know, obviously he's a very very bright thinker. And, um, and he, he'd spend a lot of time alone, but out of desperation or out of his, you know, parents needs, they would kick him outta the house and they would say, go find somebody to play with and go play with them. But if he had an option to just sit around and hug an AI stuffed animal, you know, or an AI humanoid robot, he himself said, I probably wouldn't have gone out and I probably never would've built human relationships.

It feels to me like yes. These things start this way. They start with us, servicing a real genuine need due to a gap in our capitalist system that are, you know, or just general system where something's falling through the cracks. We as technologists want SBIR grants and NIH grants and we go off and we get them.

Those folks are trying to address the societal needs by making the money available, but somehow it leaves that domain. Somehow this system, if it works tremendously well, will not only be used in this, sad but necessary scenario and will at some point start replacing real human human relationships.

Is that something we care about or is that okay? Like at some point are these things getting so good that we're okay with replacing human relationships with the AI human relat? I already see it happening with me. I mean, I spent an inordinate of down a time talking to chat GBT. And Gemini and some of the other models, but mostly, you know, ChatGPT, like one of my favorite pastimes, now I get in the car, I'm driving home from the mountain or something and I just plug in ChatGPT and I'm like, you know, I've been thinking about it could be a political thing, could be a psycho psychological phenomenon, could be something super nerdy and techy, but there's no one else that I can talk to about anything.

It's already replacing my human relationships on some level, but I think I worry about it. I think we're going somewhere new and weird, like, I don't know where we wind up. This is not to place blame on anyone at your company. I mean, I think this is a responsibility all of us working in ai, right? You know, have, and policy makers.

Frankly, I think it's gonna radically disrupt society.

Niv: I think so. we just have to make, I feel like a conscious effort to Exactly. maybe a question to ask is like, why do you choose, uh, chat GPT to talk to, versus like, your

Deep: family or friend?

Niv: Uh,

Deep: nobody I know knows anything about whatever obscure thing I'm suddenly obsessed with.

And in fact, I would argue that I've always had obsessions with obscure things, but the friction it took for me to answer those there, there's a period when I was in college and grad school that, you know, you have these amazing libraries. I would just live in those libraries. I didn't run around and find, not un even in my subject areas of interest, like in just weird topics some obscure chemistry, library or something.

but now I have that in my, I don't even have it in my pocket. I have it, like I can just talk to it now. Like it's such a huge leap from those days. That it feeds my intellectual curiosity to the point where I want to have more and more of those. I don't know if it's always healthy.

Maybe I should really just stop and talk about the weather with some random person, I think the way place of going is did you ever read Isaac Asamov stuff with the robot? Yes. You know how, you know how he had those first principles of robots and supposed to kill people and all stuff?

Yep. Mm-hmm. It feels to me like maybe what we're talking about is a first principle of a robot that has to encourage human, human contact. Right? Right. That could be the solution. It could be that Ria says, Hey, did you, I was just talking to, Elaine about such and such. You two should talk. Yeah.

And she right over there, like that kind of thing.

Niv: No, uh, I mean that is exactly what we're working on, but with boundaries, that RIA is, able to, you know, say that, Hey, you should talk to Elaine, not about her health condition. Right. That she talk to me about, it's like personal, , but, , that she likes this, star Wars character or MCU character that we should talk to her about, something like that.

Yeah. but going back to, uh, being your therapist for just a moment, uh,

Deep: okay. I have a robot therapist. I have many robot therapists, but I probably need a real one.

Niv: Right? Go for it. I mean, the fact that I spend so much time with like a neurologist, a therapist has made me one.

Even though my degree is not in that, I, you know, strongly believe that the fact that you are now using Chad GB to have your beard conversations, has that made you like a better conversational with your family?

Deep: I would say no. I mean, maybe in the sense that she, GBT is really good at being super positive and nice and maybe some of that kind of creeped in, but it feels orthogonal because the subject matter that I interact with, she GBT is so specific and utterly of non-interest to anyone in my family.

But, you know, 'cause I'm talking about machine learning thing or some AI thing or some, you know, music guitar nerd thing, you know. But, um,

Niv: you know, something you've touched on is also, something that we are doing research on, And like, in addition to the, just the speech to speech models, can we do a real therapy conversation?

Uh, because what we are doing right now is, is exactly like, there's just a lot of positive, building people up, uh, versus like a real therapist would like go everywhere. Uh, right. Like would Oh, yeah, yeah,

Deep: yeah. Yeah. I mean, this is actually a known, a known critique. I think there was a paper that came out Yeah.

Ago. I don't know if you saw that one. I did see that. Yeah. Yeah, that was the one. Yeah, that was. That was a really big deal. Yeah. And I talked to my daughter, my daughter's a, you know, a psychology major in university, and she's like, Dad

Niv: Oh, nice said. That comes as a shock to you

Deep: because she's like, you're such a technophile.

And I'm like, I mean, honestly, I spend most of my time like critiquing technology. She's like, not well enough. I'm like, the fact that you even think that you can address these. And I'm like, well, okay, well we can talk about that in more depth. But I think she's, she's, she's right to a large extent that, this is our bias in technology is to, it's because just to run with something Yep.

I can't remember the exact paper, but I, I think, I think the conclusion was something to the effect of the bots, participate in the delusions of the patients. So if, you know, if I think on Jesus Christ, the body either plays along in sarcasm or thinking it's a joke.

but even once they know I'm have mental health problems, they should correct. But they don't because they've been, they, they've been trained with this massive bias towards

Niv: ing. Yeah,

Deep: exactly.

Niv: Exactly. Uh, and I think that is what, um, in that paper, is the, you know, it's kind of research

Deep: that we are looking into.

Um, yeah. It's a big deal. And in fact, there should be an army of, uh, research that comes out of that. Right. And we should start correcting it. This has been an awesome conversation, niv, as always. I really love our conversations. I wanna end on one last and final question. The one I always end on, five to 10 years out, you get everything you want, everything that you do works.

Describe the world. And I always bug people, not just for the unicorn to your, to your left, on your left shoulder here with the positive view, but give us the concerns, the negative view as well.

Niv: Right? Right. dystopian view is exactly what you, what you said, which is, we don't want to, you know, go down a path, where we forget how to have human conversations. and we are just like, comfortable, getting just positive information, you know, from a so that is the disturbing view, but the, yeah, the positive unicorn and rainbows, uh, view, AI like bots, empathy bots like Ria uh, remind us, of, being more empathetic and having like richer experiences with other humans,

Deep: right. Yeah. So.

awesome talking.

Thanks so much. All

Niv: right. You take care. And thank you.