GPT is dumber than a 1990s AI
Deep Blue would roast GPT at Chess
One of the big issues with the current “GPT is intelligent” hype is that by 1990s standards GPT is dumb as a box of rocks. That is because in the 1990s the testing standard for “intelligence” was Chess. A game with a limited set of rules, but a near infinite set of variations, where looking forwards and planning strategy is central to success.
In 1997 the IBM machine “Deep Blue” beat the world chess champion Garry Kasparov in a landmark for AI.
Role forwards to 2023 and chess computers have only improved, to the stage where today there are scandals in the world of chess where grandmasters are accused of cheating by using AIs and online the banning of people for using, or being bots, is common place.
GPT sucks at Chess, don’t even think about GO
GPT (yes including 4) is a dreadful chess engine in comparison, and if you go to an even simpler game which has much greater variations — Go — then its even worse. And with GO you can see the problem. GPT learns anything based on a huge amount of content and then making autocomplete approaches. So if you follow a really standard game approach then GPT looks like it know what it is doing, as it can follow a previous game. GPT4 is certainly better than GPT 3 at this. If however you do something “unusual” then GPT struggles.
If you play a good chess AI against GPT then it is pretty clear that GPT is, in AI terms, dumb. It isn’t even close to the 1997 level of AI. That is 26 years ago and GPT isn’t close.
Anything looks smart if you set out to prove its smart
The point here is that lots of the articles out there “proving” that GPT is intelligent. How GPT does really well on this or that exam, how GPT can create a paper on a topic that passes review, or (by far the weakest that I’ve seen) how GPT can “invent new products”. These are all impressive achievements, and things that Deep Blue absolutely could not do.
This is no different than claiming that Deep Blue was intelligent because it was really good at Chess. These are things that GPT is designed to do well, but that does not mean it understands the topics. There are lots of topics out there, like Chess, where GPT sucks. This is not a criticism of GPT, as GPT is not meant to be good at those things.
If you set out with a test that an AI is good at, like asking Deep Blue to play chess, or GPT to take a test where rote learning is effective, then guess what… you’re going to prove that it is good at what it is good at.
Pick the tool for the job, that doesn’t make the tool intelligent
The best way to get GPT to be good at maths? Get it to say “Oh crap, I’m bad at maths, I’ll had this over to Wolfram Alpha”, the same would be true with Chess, build a Stockfish plug-in so it plays a phenomenal game of chess. This doesn’t mean that GPT is now able to do maths or chess, just that the system has been designed to choose the right tool for the right job.
If you build a plug-in that is topic specific then it is you that is giving the intelligence to the system, not the machine. You can build an amazing system with brilliant natural language interface and a huge range of topic specific AIs that utterly wipes the floor with humans on massive numbers of fields.
That massive machine is still not intelligent, it is just an utterly phenomenal tool. A pulley system isn’t “stronger” than a person, it is just a great way to multiply effort. The same is true with all these AIs, they are brilliant pulleys, and if you use them for what they are intended for then they can be utterly spectacular.
The goal of testing for intelligence in machines is to prove that it isn’t
Intelligence tests for humans start from the knowledge that humans are intelligent creatures, ones with imperfect memories but the ability to understand rules and constructs to generate new approaches within those rules. Humans are also creatures with context, ethics and share experiences that are assumed within such tests. Thus human tests are constructred knowing you are testing a degree of intelligence not whether or not
Machines aren’t intelligent creatures, therefore testing for intelligence in machines should be adversarial, aiming to disprove intelligence not confirm. I could prove a pulley is “smart” if I define intelligence as being able to lift heavy things easily. A hammer is smart if you are testing it against nails.
When proving a machine intelligent the bar is not the same as measuring the intelligence of a person. This isn’t “moving the goal-posts”, it is recognizing that AIs have, since the 90s, been able to operate on-specific tasks at above human levels.
We need to treat this as a falsifiable hypothesis, and then set out to falsify it, if we cannot create a test via which the machine is shown to be unintelligent, then that machine can, finally, be assumed as intelligent.
