SNS: THE MESSY, ANTISOCIAL FUTURE OF AGENTIC AI

THE MESSY, ANTISOCIAL FUTURE OF AGENTIC AI

By Berit Anderson

Why Read: Agentic AI has become ubiquitous. Meanwhile, the hallucination rates of "reasoning models" have never been higher. In this week's issue, we explore the future implications of widespread agentic AI deployments across security, personal, and financial platforms.

_______

Society is facing a bizarre technological reality: billions of dollars of sunk investment are currently pushing an inherently broken (albeit extremely helpful, in limited capacities) new tool into the vast majority of technology systems driving our businesses, our finances, our infrastructure, our personal lives, and the global economy itself.

There are real potential benefits to the widespread use of agentic large language models - chief among them being the ability to eliminate a huge amount of busy work currently undertaken by humans. Doctors, for example, no longer need to physically type up notes on each patient in an already saturated schedule. Entrepreneurs can suddenly launch low- or no-code apps and websites and create unlimited marketing materials without ever hiring a team. Mid-level managers can answer twice as many questions and provide unlimited guidance to direct reports by reviewing LLM-written emails rather than drafting them themselves.

This report is not about those benefits. They are already at the center of the conversation about AI. It is about the challenges and issues we can expect to arise from the blind, widespread adoption of hallucinatory agentic reasoning systems and other forms of LLMs.

They are here. They are headed for widespread adoption. It's time we start discussing the real, granular impacts of that.

The Hallucination-Adoption Paradox

There's a core paradox at the center of our shared future: As hallucinations rise, LLMs are being deployed more broadly with less human oversight.

As the business world gears up for widespread LLM adoption, OpenAI and others have turned to so-called "reasoning models" - theoretically, in an attempt to reduce hallucinations. There's a widespread claim among those in the industry (and general users) that one way to get around the hallucination issue is to ask an LLM to check itself, even several times. Here's a test I ran using Anthropic:

As you can see in the exchange above, this does work. But it's only really useful if you, the human agent, know enough to be suspicious of what it tells you, or already know the answer, and therefore which version to trust.

Companies including OpenAI, Google, and DeepSeek have tried to add that process into the models themselves. But while these added "reasoning" steps have improved some mathematical and coding calculation skills, they've also increased hallucinations. Significantly.

As Cade Metz and Karen Weise reported last week for the New York Times, in one recent report by OpenAI on its own hallucination rates,

[t]he company found that o3 - its most powerful system - hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI's previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.

When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.

Between 51% and 79% wrong? Oopsie.

Why are these models getting so much worse? From the same article:

"The way these systems are trained, they will start focusing on one task - and start forgetting about others," said Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is among a team closely examining the hallucination problem.

Another issue is that reasoning models are designed to spend time "thinking" through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking.

Basically, by removing the human from the reasoning equation, you magnify the risk of error. It's almost as if human knowledge - and its associated guidance - were the most important part of that guardrail, not the LLMs' reasoning capability.

Despite that fact, the LLM business ecosystem is moving full speed ahead on agents - essentially autonomous versions of LLMs and other reasoning models. As described on the AWS website:

[. . .] an AI agent is a software program that can interact with its environment, collect data, and use the data to perform self-determined tasks to meet predetermined goals. Humans set goals, but an AI agent independently chooses the best actions it needs to perform to achieve those goals.

The promise of removing humans - and human error - from the loop is a strong pull for Wall Street and the Fortune 500, which has been paying a premium for coding and management talent for the last 10 to 15 years.

As reported by ZDNet:

Tech executives are moving swiftly to embrace AI agents, according to the latest Technology Pulse Poll from accounting firm Ernst & Young (EY).

The poll, which surveyed more than 500 tech leaders in April, found about half (48%) of respondents have at least begun deploying agentic AI within their organizations. Slightly more (around 50%) said most of their companies' internal AI operations will be fully autonomous within the next two years, further indicating a movement toward agentic systems.

Right. Because the best thing to do when something is wrong up to 79% of the time is to make it autonomous, throw it into the mix with a bunch of other hallucinating systems, and then put it everywhere. WHAT COULD GO WRONG?

The Era of Agentic Lone Rangers

The vast majority of users are still at the very beginning of the agentic revolution - what I call the Era of the Disconnected Nodes. In this stage, early adopters use a series of disconnected agents to increase their efficiency in core tasks that previously took a lot of time but weren't the best use of human capabilities.

In the Era of Disconnected Nodes, hallucinations can create legal, cybersecurity, and PR issues. But with a human user in the loop, runaway-train scenarios are less likely.

To explain what I mean by this, let's look at coding - one of the agentic use cases most hyped among the Fortune 500 today.

In less than five minutes, I - a human with no coding experience - can generate the framework, code, and directions to deploy a basic news website using GitHub Copilot. However, in order to deploy, I still must execute the code myself.

This is a key liability distinction, protecting Microsoft from legal risk related to the deployment of code and its impact on the outside world. That's especially important when the whole world is relying on your tool to create code - and some non-zero portion of it is guaranteed to be completely made up.

According to Computerworld:

A GitHub survey of 2,000 developers in Brazil, Germany, India, and the US found that 97% were using AI coding tools by mid-2024. And according to a HackerRank survey of more than 13,000 developers across 102 countries released in March, AI now generates, on average, 29% of all code.

Cursor is the most popular of these agentic coding tools, providing developers with an agent that can review, propose, and then deploy changes to the code base. Importantly, the developer is always in the loop, and each step and its reasoning is documented in the chat window for the human in charge.

This is an effective approach.

Dan Shiebler is head of machine learning at Abnormal AI, a cybersecurity firm I ran into at RSA last summer. He and his team are using a range of agentic tools to increase their efficiency. "Bolt, v0, and Lovable are three tools in this category," Shiebler told Computerworld in an article published earlier this month. "I personally like Lovable, but we've seen a lot of success with v0 for interface design, where it's taken the place of Figma in a lot of user workflows."

Agents are least risky in scenarios in which the agent is used internally, and a human is the authority - and operates from a position of skepticism. As they would interact with an intern.

However, even in such scenarios, without sufficient human authority and experience, expensive and complicated challenges do emerge, such as:

In one story shared with me by a developer, a pair of teams spent two months building two different parts of a tool based on framework guidance put forward by an agent. At the end of that time, when they tried to connect the two pieces, it became clear that there was nothing they could do to make the two halves of the project compatible.
An agent advising a greener or inattentive developer (as so many humans are) could easily integrate security risks and vulnerabilities into a company's code base, undermining the long-term brand value of the company and creating risks to the security of customers' data.
Swedish fintech company Klarna recently walked back its plans to automate all of its marketing and customer-service jobs, admitting it significantly cut the quality of the company's outputs in these areas.

There's an entire subclass of problems that arise when the agent is operating as an authority to an external user of a product. For example:

HR agents hallucinate company benefits when answering questions from an employee, which the company is then legally required to provide.
Or, as happened with Cursor, a customer service agent communicates to users that they can no longer use their paid service across multiple machines - a policy it hallucinated, but which nonetheless caused an HR nightmare.
Airline agents offer bookings on flights that don't exist.
Google's Gemini turns its search results into undependable drivel that may or may not be even remotely accurate.
Mark Zuckerberg spams internet society with fake friends that no one asked for and that sociology studies have already proven won't actually improve the loneliness epidemic.

While mainstream tools such as GitHub, v0, and Cursor may call themselves "agentic," it's a bit of a misnomer. There is little real autonomy here, because a human is always directing and being apprised of what each "agent" is doing.

But there's good news: we're now leaving that boring world of limited risk behind. It's time for exponential risk!

Super Agents & Adversarial Agent Use

There is a growing body of tools, like Genspark and Manus, that will coordinate the creation of research, content, and materials from multiple agents and deploy a limited but growing menu of key tasks for you - mostly around planning vacations, booking things, etc.

So what if Genspark books your flight wrong and you wind up stranded in Bali for an extra week, requiring thousands of dollars in hotel rooms and change fees? Its user agreement limits its liability to the total amount you've paid them in the last year - or $100.

Some (including Genspark) call these tools "super agents." I find that terminology generous.

However, these tools, and others like them not in the public domain, enable a new scale of adversarial complications.

So far, we've discussed only the unintended consequences of hallucinations and the mostly prosocial use of agentic AI. But the intended consequences of agentic warfare are way scarier. Imagine, for instance:

Flooding the zone of every social platform with misinformation that changes global opinions and shifts the course of elections. This is already happening, and it will only get worse.
Mass scanning of finance, infrastructure, and other websites to identify and exploit vulnerabilities.
Mass targeting of vulnerabilities in water and electricity infrastructure, allowing adverse actors to poison community water supplies by reprogramming the software that monitors and operates wastewater treatment facilities.

And those are just a few off the top of my head.

The Chaotic Era of Networked Agents

Most of today's super agents still use a top-down structure: a single, human-directed agent issues instructions across a range of domains. But the eventual vision being sown by leaders at Microsoft, OpenAI, Anthropic, and beyond is a network of agents that communicate and collaborate to meet a human-directed goal.

Given current technical limitations to LLMs - and the challenges already emerging from generative models - I predict that early adoptions of this paradigm will be disastrous.

Remember, from the studies above, how sharply hallucinations increase given the removal of human common sense and critical thinking from just one agent? Now imagine connecting tens, or even hundreds, of agents, removing the human guardrails and allowing the network to direct your business decisions, your finances, your healthcare, to select and register your child in the best preschool.

With even the low end of current hallucinations (e.g., 33%), all kinds of chaos would break out. And here we are preparing for a future agentic architecture that will likely exponentially increase those hallucinations and amplify other antisocial tendencies of human behavior.

Keep in mind that agentic models are trained on the entire range of human expression, including narcissistic and antisocial personality disorders - disorders that express themselves in lack of empathy, transactional personal relationships, delusions of grandeur, seeing people as expendable, and other traits that undermine community and relationships but often ultimately help their users to amass money and power.

Without human interference, these traits are liable to emerge at much higher rates in the ways networked agents carry out their instructions - and what new interpretations of those instructions they will no doubt invent.

This is the world toward which our sunk investment is currently throttling us - a religion of networked agents.

As Sam Altman wrote in a 2019 blog post: "A big secret is that you can bend the world to your will a surprising percentage of the time - most people don't even try, and just accept that things are the way that they are."

"The most successful founders do not set out to create companies," he wrote. "They are on a mission to create something closer to a religion, and at some point it turns out that forming a company is the easiest way to do so."

Your comments are always welcome.

Berit Anderson

Sincerely,

Berit Anderson

berit@stratnews.com

DISCLAIMER: NOT INVESTMENT ADVICE

Information and material presented in the SNS Global Report should not be construed as legal, tax, investment, financial, or other advice. Nothing contained in this publication constitutes a solicitation, recommendation, endorsement, or offer by Strategic News Service or any third-party service provider to buy or sell any securities or other financial instruments. This publication is not intended to be a solicitation, offering, or recommendation of any security, commodity, derivative, investment management service, or advisory service and is not commodity trading advice. Strategic News Service does not represent that the securities, products, or services discussed in this publication are suitable or appropriate for any or all investors.

We encourage you to forward your favorite issues of SNS to a friend(s) or colleague(s) 1 time per recipient, provided that you cc info@strategicnewsservice.com and that sharing does not result in the publication of the SNS Global Report or its contents in any form except as provided in the SNS Terms of Service (linked below).

To arrange for a speech or consultation by Mark Anderson on subjects in technology and economics, or to schedule a strategic review of your company, email mark@stratnews.com.

For inquiries about Partnership or Sponsorship Opportunities and/or SNS Events, please contact Berit Anderson, SNS COO, at berit@stratnews.com.

SNS Terms of Service

ETHERMAIL

Note: Some letters may be republished to include subsequent replies.

Subject: Re: China and Apple

Hi Evan,

Don't know if you've noticed this story - basically the thesis is that "China wouldn't be China without Apple," but the details are interesting.

https://www.nytimes.com/2025/05/15/books/review/apple-in-china-patrick-mcgee.html

I hope you're good!

Cheers,

John Payne

Ecologist | Data Scientist
Principal, Blue Dot Research LLC
Vashon, WA

John,

Definitely interesting, thank you for passing this along. I agree that Apple is in awfully deep, although I think it's a remediable situation given time (most situations are :) ). As for "building an army" in China, it is in many ways true, and potentially problematic.

I'm doing great, looking forward to seeing you soon at FiRe!

Evan Anderson

Subject: "SNS: BRACING FOR IMPACT: How to Prevent and Prepare for an H5N1 Pandemic"

Well done again Evan, thank you. As to investments, have you explored Anduril?

Scott Biddle

Roche Harbor, WA

Scott,

Thank you! As for Anduril, I've definitely had an eye on them for some time. It's a private company still, but Palmer Luckey has said he thinks they should IPO soon.

I would want to see at what price and more financials, but its current work on drones does look very promising.

Subject: SNS Special Alert: China at Cliff Edge

Mark, Evan, Berit,

Agreeing to 30-10 would reinforce your alert message, yes?

Unfortunately it's not still 145%.

Paul Shoemaker

Executive Director, Carnation Farms
Consultant & Author, Can't Not Do and Taking Charge of Change
Founding President, Social Venture Partners Int'l.
www.paulshoemaker.org
Seattle, WA

Paul,

Absolutely, I think they came to make a deal based on the desperation they are probably feeling about the domestic situation in China.

As for the 145% disappearing, my take this morning was that they seem to have gotten the Chinese to drop restrictions on rare earths, etc., which they have a stranglehold on. This would make more sense of the sudden willingness to drastically lower tariffs.

Evan Anderson

Evan,

The Chinese actions are pretty dramatic and desperate.

Paul Shoemaker

UPCOMING EVENTS

OUR PARTNERS

WHERE'S MARK?

* On June 8-11, Mark will be speaking on a variety of subjects, and hoping to see many of our SNS members in person, at the FiRe 2025 conference.

"Strategic News Service," "SNS," "Future in Review," "FiRe," "INVNT/IP," and "SNS Project Inkwell" are all registered service marks of Strategic News Service LLC.

ISSN 1093-8494