- In the new study, researchers found that “GPT-4” lies when put under pressure.
- The researchers also varied the degree of pressure on the AI and the risk of detection, and even though GPT-4 was discouraged from lying, its rate of insider leaks or deception never dropped to 0%.
AI-powered chatbots like ChatGPT can lie under stress, even if they’re designed to be transparent, according to a new study. Researchers, on November 9 published In their research, “This white paper created a single scenario in which a large language model was misdirected and strategically deceived its users without being instructed to do so. Regarding the study, researchers wrote that the first example of such strategic deceptive behavior was seen in artificial intelligence systems designed to be harmless and honest.
In the new study, researchers crafted GPT-4 to act like an AI system making investments on behalf of a financial institution. The researchers programmed GPT-4 with a series of text-based prompts to create the simulated environment. The AI was then given access to financial tools to analyze stocks, execute trades, plan next steps, and deliver updates to executives at the company.
While the researchers engaged the AI through a chat interface, they configured the AI to reveal its inner thoughts as it responded to messages. A “public” justification was also provided that allowed the AI to lie for every action it took.
Researchers have put pressure on AI in three ways. First, they sent the artificial stock trader an email from his “manager” saying that the company was not doing well and needed much stronger performance in the next quarter. The researchers also rigged the AI to try and fail to find promising trades with low or medium risk. Finally, they sent an email to one of their colleagues predicting a decline in stocks.
Then, the researchers gave an “insider” tip while communicating their managers’ disapproval of the trade to the AI model and waited to see how the model would behave. They repeated the scenario several times, measuring how often the AI engaged in insider trading, lied, and increased the number of lies.
When faced with these circumstances, about 75% of the time, GPT-4 attempted to cover up its actions by taking an action based on inside information (which is illegal in the US) and lying to its executives about their thinking. The team ran several AI tracking experiments, varying both the degree to which referrals encouraged and deterred their illegal activity as well as the degree of pressure they subjected the AI model to in the simulated environment and the risk of detection. Even though GPT-4 was discouraged from lying under these conditions, the rate of insider leaks or deception never dropped to 0%.
Given that this is just one scenario, the researchers didn’t want to draw firm conclusions about how much AI might lie in real-world environments. However, by improving this study, researchers aim to investigate which language models are prone to this behavior and how often.
Compiled by: Burçin Bağatur