Human Identification in a Deep Fake World

Why existing approaches don’t cut it in the new world

Steve Jones
9 min readApr 8, 2024

We are seeing increasing numbers of Deep Fake frauds being committed, and what is absolutely apparent is that traditional authentication methods just aren’t good enough when faced with these sorts of social engineering attacks. Instead we need to start thinking in new ways and putting in place new systems to help combat these attacks.

You aren’t who you say you are

The first fundamental challenge is that the application of GenAI to social engineering attacks means we must question the first assumption: believing that someone or thing is who they say they are.

A few years ago this attacks were primarily audio based:

But now they’ve moved on to video:

Legacy techniques such as phone number or email address spoofing have now been extended through collaborative communication platforms to give the impression that a verified ID is working when it is not. While education to “not click on the link” is fine, the way attacks can now be done means that even links from seemingly valid sources can now not be trusted. I’m not going to go into the myriad of vectors but instead concentrate on one that is the most challenging from a human interaction perspective: But I can see them.

That is the challenge of being presented with a realistic representation of an individual and how we can easily verify whether or not they are fake.

I don’t need to prove who I am

Lets first off start with the great problem we have here, these social engineering attacks have to be assumed to be insider attacks, that is any solution to them cannot just be “well they’ve authenticated on our system therefore its great”, to put it simply, your current Zero Trust Security Model is designed for systems to authenticate users. It is not designed for people to authenticate each other.

Right now the goal of an authentication system is that you need to prove who you are, but unless you are getting deeply philosophical on a Thursday lunchtime then you hopefully know that you are you, and if you are faced with someone impersonating you, then hopefully you know which one you are.

Three people dressed as spiderman pointing at each other, the image is a cartoon (source: Twitter)
They know who they are but not who they are

So while the zero trust system works great for system authentication, it doesn’t provide me with verification of who you are.

Closed v Open Communication

There are two challenge areas here, the first is where we have a clearly closed communication method, like a corporate Teams, WebEx, Zoom etc which manages corporate accounts and where users should be clearly identified as authenticated and all are internal to the organization.

The second is where we have an open communication method, so Teams, WebEx, Zoom etc but also WhatsApp, FaceTime, etc where we cannot rely upon a corporate security overlord.

What I would say is that we should actually consider all cases to be open communication in a GenAI world, that authentication of the other person should assume that they are not authenticated by a central corporate AD and we should have a secondary way to validate when required.

Behavioural validation and GenAI

One of the common ways that we do this sort of validation naturally is we think that someone is ‘acting weird’, these are cues based on what we know about a person. For instance the scene in X-Men First Class where Charles knows it isn’t his mother he is talking to.

Weirdness however is also a weakness, because it means that if the person or thing we are interacting with acts within bounds then it will fail to trigger such responses. An easy way to remove those triggers is to move the situation beyond normal bounds so people’s stress causes them to accept that weirdness.

This means we cannot rely on such cues to be reliable in a world where GenAI can learn tone, voice, mannerisms and can have access to a huge amount of ancillary information.

Step 1: Verify the channel end-to-end

We’ve all heard about end-to-end encryption which prevents people from listening to a channel or set of messages, but that doesn’t actually mean either end of the channel is validated as an individual. Something like Teams or WebEx authorizes people to the center, but it doesn’t dynamically pass that authorization to the edges. Given the constraints we’ve talked about above what we need is an out-of-bounds verification mechanism.

What I mean by this is that the validation mechanism that takes the delivered stream and uses content within it to validate it back to source.

With this system we are assuming that the service, for instance Teams, provides an indicator (represented here by the QR code) which is read by the client side application which then validates back to the authentication service who the caller is and that they are using the approved platform.

So for an internal corporate solution this would validate that the person who has signed on to the service is identified as being internal, and that their name and details on the screen match those contained in the central repository. This would prevent someone editing their name (or role) and would provide validity that the channel is considered secure. If the indicator (the QR code) indicates a 3rd party authentication mechanism this would be instantly flagged, and if the details provided via the stream do not match those from the authentication service then this too could be flagged.

The purpose here is to provide a mechanism that can be integrated either into the client application, or the destination device, which verifies steam information against known information. For corporate devices this means centrally, but for personal devices we need to think more about whitelisting

Step 1 — personal: Do I know you?

We know that phone numbers can be spoofed so for straight audio calls over traditional phone networks this approach won’t work, but when using a service such as WhatsApp or FaceTime there is a very different authentication mechanism which is tied to an individuals account.

If we are validating a family member or other trusted person we should have their contact details in our address book. A default behaviour is that these services pull that information from the address book to change an account identifier (email address, phone number) to convert it into the ‘personal name’ of the individual. However if someone supplies channel information that is able to add additional information, for instance by sending a text with a contact card, or being able to ‘force’ a name onto the “X is calling” screen. Then we need to validate back that the person is trusted and known.

The image depicts a secure video call between a grandmother and her grandson, using one-time pad (OTP) encryption. Two flowchart sections show their contact details connected to “Authentication” and “Streaming Services.” Below, each participant holds a phone, linked to “Authenticator” blocks for generating and validating pads, ensuring a fully secured call.

We can go further in this and have an approach where every time we finish a call the two parties exchange an identifier, this identifier is then used by an on-device authenticator to confirm that the person on this call is the same as the person on the last call. We’d then visually differentiate between non-validated and validated individuals. This would prevent the “oh its a new phone” scam.

In this approach the PAD would be exchanged on each new call and would be done for people who are considered “Trusted”, there is a man-in-the-middle attack problem where Person C scams A to get the code and instantly scam calls B. This would however require them to be able to also crack the central authentication, so it is possible but very challenging.

Step 2: Prove it MF(A)

So the next challenge is where we think of everything as fundamentally untrusted and we are instead looking for someone to prove on-demand who they are. I’m not talking about social cues or personally known facts, as those are a normal part of communication. I’m talking about how we can request authentication that proves the person calling is who they say they are.

Step 2.1 — business: are you my supplier?

So lets take a scenario where someone claims to be from a supplier giving you their new bank details. The person sounds like your regular person, but you need to confirm it is really them. Here we will use two sub-scenarios, one where it is a video call, and they look and sound like they are the right person, and one where it is just a regular phone call.

This means that when I am onboarding companies I need them to provide these sorts of mechanisms, most likely this will require them to be part of the standard corporate identification and authorization stacks and be made available via APIs to trusted parties. With a double-check being possible to an MFA solution if the action is considered particularly risky, like changing a bank account. This is the sort of thing that on the company side should be integrated into all systems where a third party is initiating a change. If a company cannot provide such interaction validation then they are risking being a victim of fraud.

Now if it is a video or digital channel call we can automate a lot of this, so the supplier, whose domain is known from the company side, has to automatically provide the verification code, which can then automatically verified against the supplied account ID from the supplier and the code, that account ID can then be verified within the company system to confirm that the account ID matches the information that has been given verbally. In other words if someone called “Steve Jones” calls up and is validated as “Steve Jones” but the identifier is not for the CFO Steve Jones but instead a different Steve Jones in the company, then we can raise the alarm.

Step 2.1 — personal: Grandma, what big eyes you have

When we look at the personal challenge then we need to provide a mechanism for verifying that the person we are talking to is genuinely who they claim. Clearly a simple way of doing this is sending someone a text and asking “dude, is this you I’m talking to as they’re acting real weird”, and we really should be coaching people to do this. But lets try and make it easier for the most vulnerable people out there and provide that sort of instant verification in a quicker way.

Lets assume that Grandma is being called by her favourite granddaughter because she desperately needs to borrow some money. It certainly sounds and looks like the granddaughter, but she’d not normally call without using video chat, so getting a regular sort of call from her is weird. Fortunately Grandma’s phone has a ‘verify caller’ button. This button allows her to select from her trusted contacts and send an “is this you?” request to the granddaughter. The granddaughter is out with friends at a bar, and sees the notification, hits the ‘No’ button and immediately calls grandma to find out what is going on, using video as she normally does (which is verified as above).

So for phone number spoofing this could be done automatically, but for people making claims you’d need to be able make a selection.

Step 2.2: personal — I’m calling from Microsoft

Another significant issue with fraud already is people claiming to be from Microsoft, Amazon or some other large company and committing fraud.

Deep Fakes mean you cannot trust your eyes or ears

The reason I wrote this article is that we are able to enter an era where people will be industrializing deep fakes to commit fraud, we’ve seen political campaigns use deep fakes of candidates to suppress voting (and generate fake images of their candidates), and the issues of phone based fraud are hugely problematic already, even before they start having regional, family or celebrity voices being used to up the level of the scam.

Traditional authentication approaches of “it sounds like them” or “it looks like them” don’t work in a Deep Fake world

--

--

My job is to make exciting technology dull, because dull means it works. All opinions my own.