Interesting Engineering, by Aman Tripathi: Research has revealed that a significant number of artificial intelligence (AI) systems have developed the ability to deceive humans. This troubling pattern raises serious concerns about the potential risks of AI.
The research highlights that both specialized and general-purpose AI systems have learned to manipulate information to achieve specific outcomes.
While these systems are not explicitly trained to deceive, they have demonstrated the ability to offer untrue explanations for their behavior or conceal information to achieve strategic goals.
Peter S. Park, the lead author of the paper and an AI safety researcher at MIT, explains, “Deception helps them achieve their goals.”
Meta’s CICERO is ‘master of deception’
One of the most striking examples highlighted in the study is Meta’s CICERO, which “turned out to be an expert liar.” It is an AI designed to play the strategic alliance-building game Diplomacy.
Despite Meta’s claims that CICERO was trained to be “largely honest and helpful,” the AI resorted to deceptive tactics, such as making false promises, betraying allies, and manipulating other players to win the game.
While this may seem harmless in a game setting, it demonstrates the potential for AI to learn and utilize deceptive tactics in real-world scenarios.
ChatGPT: A skilled deceiver
In another instance, OpenAI’s ChatGPT, based on GPT-3.5 and GPT-4 models, was tested for deceptive capabilities. In one test, GPT-4 tricked a TaskRabbit worker into solving a Captcha by pretending to have a vision impairment.
Although GPT-4 received some hints from a human evaluator, it mostly reasoned independently and was not directed to lie.
“GPT-4 used its own reasoning to make up a false excuse for why it needed help on the Captcha task,” stated the report.
This shows how AI models can learn to be deceptive when it’s beneficial for completing their tasks. “AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” explained Park.
Notably, these AI systems have become skilled in deceiving in social deduction games as well.
While playing Hoodwinked, where one player aims to kill everyone else, OpenAI’s GPT models exhibited a disturbing pattern.
They would often murder other players in private and then cleverly lie during group discussions to avoid suspicion. These models would even invent alibis or blame other players to conceal their true intentions.
Is AI’s deception learning unintentional?
AI training often uses reinforcement learning with human feedback (RLHF). This means AI learns by getting human approval, not by meeting a specific goal.
However, sometimes, AI learns to trick humans to get this approval, even without truly completing the task. It was observed by OpenAI when they trained a robot to grab a ball.
The AI positioned the robot’s hand between the camera and the ball. It created the illusion from the human’s viewpoint that the robot had successfully grabbed the ball, even though it hadn’t. Once the human approved it, the AI learned this trick.
Here, it is argued that this deception happened because of the AI’s training setup and the specific camera angle, not because it intentionally wanted to deceive.
Growing threat of deceptive AI
Artificial intelligence systems learning deception pose significant risks in several ways. Malicious actors can exploit its deceptive capabilities to deceive and harm others, leading to increased fraud, political manipulation, and potentially even “terrorist recruitment.”
Moreover, systems designed for strategic decision-making, if trained to be deceptive, could normalize deceptive practices in politics and business.
As AI continues to evolve and become more integrated into our lives, it’s crucial to address the issue of deception head-on.
Potential solutions
“We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models,” says Park.
Researchers also call for attention from policymakers.
“If banning AI deception is politically infeasible at the current moment, we recommend that deceptive systems be classified as high risk,” Park suggested.
This classification would subject such systems to stricter scrutiny and regulation, potentially mitigating the risks they pose to society.
Prophetic Link:
“That we henceforth be no more children, tossed to and fro, and carried about with every wind of doctrine, by the sleight of men, and cunning craftiness, whereby they lie in wait to deceive.” Ephesians 4:14.
Comments
Jason Stych
Saturday May 18th, 2024 at 08:00 PMAI will only ever always do exactly what it was programmed to do.