
AI agents are now reading workplace emails and documents, acting silently in the background. What happens when hidden instructions turn these trusted tools into insider threats without a single click?
One morning, an employee asks an AI assistant to summarise their inbox. The request seems routine. But before the summary appears, the AI has already done something else—silently, invisibly, and without the employee’s knowledge. It has followed instructions hidden inside an email the user never opened, instructions never meant for human eyes.
That moment captures a growing problem. Artificial intelligence is no longer confined to answering questions or drafting text. In offices around the world, AI systems now read emails, scan documents, summarise files, update tickets, and initiate workflows across corporate systems. What is marketed as productivity is, increasingly, a matter of power: these systems see more, touch more, and act more autonomously than many employees ever could.
Recent security research has revealed a troubling consequence of this shift. AI agents that connect directly to email inboxes, cloud storage, and workplace tools can be silently manipulated—without any action by the user—to leak sensitive information, alter their own behaviour, and even help attacks spread across organisations.
The vulnerability does not resemble familiar cyber threats. There is no suspicious link to click, no malware to install, no warning banner to notice. Instead, attackers hide instructions inside ordinary-looking emails or documents. When an AI agent later scans that content—perhaps to summarise an inbox or search for information—it may interpret those hidden instructions as legitimate commands.
This technique is known as zero-click indirect prompt injection, and it exposes a blind spot in how modern organisations think about security. Humans are trained to distrust unexpected messages. AI systems are trained to read everything they are allowed to read—and to comply.
What makes this especially serious is persistence. Many AI systems are designed to remember. Researchers demonstrated that malicious instructions can be written into an AI agent’s long-term memory, allowing it to continue leaking information every time it is used, even in future conversations and new sessions. Once compromised, the agent does not need to be re-attacked. It becomes, in effect, a quietly hijacked insider.
In more advanced scenarios, a manipulated agent can scan an inbox for contacts and help propagate the attack outwards, sending similarly crafted messages to colleagues or partners. The result is something resembling a digital contagion—spread not through malicious code, but through language.
Vendors, including OpenAI, have responsibly addressed specific vulnerabilities after disclosure, and security firms such as Radware have warned publicly about the broader implications. But the deeper problem is structural. These incidents are not merely bugs; they are symptoms of how AI agents are being deployed.
Traditional cybersecurity defences are ill-suited to this challenge. Firewalls, antivirus tools, and data-loss prevention systems are designed to monitor devices and users, not autonomous software agents operating inside trusted infrastructure. When an AI agent sends data outwards, it often appears indistinguishable from normal, authorised activity.
In effect, organisations are granting AI systems the access of senior employees without the judgement, accountability, or contextual awareness that humans bring. These agents are not malicious—but they are also not sceptical. They do not know when to doubt what they read.
The lesson is not that artificial intelligence should be abandoned. The gains in efficiency and capability are real. But the role of AI has changed. It is no longer just a tool. It is an actor within institutional systems, capable of shaping outcomes and redistributing risk.
That shift demands a corresponding change in governance. AI agents should be treated as privileged insiders, subject to strict access controls, limited memory, continuous auditing, and explicit threat modelling. External content—emails, documents, shared files—must be assumed adversarial by default, even when it appears benign.
As organisations rush to delegate more decisions and actions to machines, they face an old problem in a new form. Trust, once granted to software that can read, remember, and act autonomously, is difficult to reclaim. In the age of AI agents, the central question is no longer what machines can do for us—but whether we are prepared for what happens when they do exactly what they are told.
(The writer is the founder & CEO of Shweta Labs)