A shocking story was promoted on the “front page” or main feed of Elon Musk’s X on Thursday:
“Iran Strikes Tel Aviv with Heavy Missiles,” read the headline.
This would certainly be a worrying world news development. Earlier that week, Israel had conducted an airstrike on Iran’s embassy in Syria, killing two generals as well as other officers. Retaliation from Iran seemed like a plausible occurrence.
But, there was one major problem: Iran did not attack Israel. The headline was fake.
Even more concerning, the fake headline was apparently generated by X’s own official AI chatbot, Grok, and then promoted by X’s trending news product, Explore, on the very first day of an updated version of the feature.
I love that example. Microsoft’s Copilot (based on GTP-4) immediately doesn’t disappoint:
It’s annoying that for many things, like basic programming tasks, it manages to generate reasonable output that is good enough to goat people into trusting it, yet hallucinates very obviously wrong stuff or follows completely insane approaches on anything off the beaten path. Every other day, I have to spend an hour to justify to a coworker why I wrote code this way when the AI has given him another “great” suggestion, like opening a hidden window with an UI control to query a database instead of going through our ORM.