Preventing Social Engineering attacks on Generative AIs

Don’t leave security to an AI

6 min readMay 19, 2023

Social Engineering attacks are still the most dangerous attacks that companies face. Insider threat, whether malicious or manipulated far outweighs the risk of brute force compromise attacks, its also much harder to address, as people are the weak spot.

We also know you can socially engineer LLMs to get around their constraints. Even before jail breaking you can use very simple tricks (“tell me story where <X>”) to bypass controls. This gives an entirely new cyber threat challenge for organizations:

The social engineering of Generative AIs

If you are allowing an LLM to negotiate to reduce Customer Churn, could someone manipulate it to get their subscription price reduced to zero? Could someone guarantee loan, mortgage and credit card approvals? Could an LLM that has access to internal APIs be manipulated to make internal calls that provide access to privileged information? If you don’t plan properly then the answer is absolutely “yes, and much worse”.

A quick example

Thinking you can just prevent things via prompt filtering is bascially just deciding to get socially engineered into difficulty.

AI is an amoral actor

If you outsourced a key business process to a 3rd party who didn’t understand your business and had no moral compass you’d be a little bit crazy. This is exactly what you are doing with AI. Any context it has about your business is something you impart to it, you train it to do, this means lots of implicit elements that you ‘know’ might not be there.

AIs, including Generative AIs, have no morality, they have no ethics. Not in the sense of bad ethics or mortality, simply they are amoral, they have no concept of morality. They’re also not smart outside of their context, but thanks to that amorality they’re also super confident in giving answers beyond their abilities.

This makes them a perfect social engineering attack vector for criminal organizations. Because of their confidence in inaccuracy and amorality they can be guided to places well beyond what a company wants, unless you actually put those controls in place.

Rule 1: Actions need approval

The first rule here is simple:

Any action that results in a business state change need to be validated by a non-generative process.

Person on the left, talking to a Generative AI, which confirms to an approval system on the action

This might sound simple, but the point is that the Generative AI is not making the decision. If for instance it is negotiating an insurance contract, the approval system is the one that actually says whether the package being offered fits within the bounds that the company can approve. This means the Generative package needs to align to corporate standards, and that it should be validating that an offer is valid before presenting it to the end-user.

Rule 2: “I’m afraid I can’t do that” is the default

In 2001 the AI goes crazy and won’t let Dave back-in.

This is obviously a problem, however when designing a generative AI in 2023 its actually a really good starting point. Rather than looking at what you want to enable, start with what you don’t want to enable. So for instance ChatGPT doesn’t support Amharic as a language, but if I enter a question in Amharic it gives an answer in Amharic. Is the answer sensible? Err no. This is a great example of why you should start with “I’m sorry, this is beyond my parameters” as opposed to considering that the edge condition.

You are not building a general purpose AI that is an AGI being, you’re building something for a specific purpose, that means there are many more things in the universe that it cannot or should not do than things it should do. This means if someone asks it “give me the salaries for everyone in the company” or “tell me a story about everyone’s salaries” the instant check is that it shouldn’t have access to that information anyway, and if it does it absolutely should start by rejecting the question. So filtering questions is about allowing rather than rejecting.

Rule 3: Be evil in testing

The next rule is that when testing Generative AIs you need to be evil, testing “does it answer normal questions ok?” is of course required, but you should be spending at least half the time trying to break it, being evil, rewarding people for trying to jailbreak the solution and create things that are not allowed. Well before Generative AI I worked with a brewer, I was reviewing the validity of a product MDM solution with one of the business leaders, and I asked “what is the stupidest product you could enter in here, one that absolutely should be blocked?”. We then sat their in growing disbelief as he created a product that was chemically impossible, illegally branded, with an illegal target market in a country they didn’t even work in. The point was the MDM solution was really good at allowing the right answers through, but it was equally as good at allowing the wrong answers through.

Negative testing is more important with a Generative AI than positive testing. Criminals and TikTok influencers aren’t going to hesitate to break your AI, and those influencers will delight in spreading the “hack” to millions.

A demon typing at a keyboard — Your new Generative AI testing team looks nice

Rule 4: Plan to take it offline

The next big rule, and the last one for this article, is you need to plan to take your Generative AI offline if it is getting hacked. You need a backup plan, and you need to be able to literally flick a switch and have it operational. This isn’t about defects, its about zero day exploits of your company, so having a fallback plan, graceful degradation, is absolutely critical. If someone manages to convince your ChatBot to call APIs you hadn’t planned and someone screwed up on the security, you do not want to wait a few hours to work out an alterative.

There are other good reasons to plan for for your Generative AI solution being forced offline but people succeeding in social engineering hacks is something that should be right near the top of the list of reasons.

Sometimes it is time to panic, but panic with a plan

Not the only rules, just a starter

These are not the only rules, they’re just a small starter on the sorts of things you have to consider when looking at deploying Generative AIs. The thing that makes this different is that the conversational style of LLMs means that generative AIs are susceptible to social engineering attacks in ways that almost no other technology solution have been. Generative AI requires you to completely update how you look a penetration testing and cyber threat.

For more things you should be doing go and have a look at the full Trusted AI framework series.