Prompt Manipulation: Pandora's Box Has Been Opened

AI is becoming increasingly important for organisations: it helps revolutionise the efficiency of business operations, streamline workflows and accelerate decision-making processes. But wherever there is light, there is also shadow. In the wrong hands these tools can lead to significant risks, as the latest trend among cybercriminals demonstrates — so-called prompt hacking.

The Cyber Security Awareness Month serves as a reminder to remain vigilant in the face of rapidly evolving AI-assisted attack methods. The newest attacker trend targets the manipulation of AI prompts. Using prompt hacking or prompt injection, malicious actors exploit natural language to cause harm — and without requiring advanced programming skills, maximum damage can be achieved.

History repeats itself here, because this approach is not entirely new. Comparable to the SQL injection attacks of the early 2000s, prompt hacking exploits the way systems interpret user inputs. Today, a malicious actor can craft sophisticated prompts that cause an AI to reveal sensitive information, execute unauthorised actions, or disrupt operations.

Because AI systems are now embedded in processes such as customer communication, payroll and data management, the consequences of such prompt manipulation can be severe. A single well-crafted prompt attack can lead to financial losses, service disruptions or even reputational damage.

This type of input manipulation comes with a very low barrier to entry. Prompt hacking requires no advanced technical knowledge — the manipulation occurs in natural language, not programming code. Because these attacks are based on words rather than technical vulnerabilities, they are difficult to detect at first glance. For example, setting the text colour of commands to white — invisible to the human eye — is a simple way to circumvent a classic defensive mechanism. This accessibility expands the pool of potential attackers and simultaneously increases the demands placed on effective defensive strategies.

Raising security standards

By exploiting the power of language, there is no single attack vector or single outcome in prompt hacking — a reality that security teams must internalise. They must now contend with harmful "prompts" rather than malicious "code." This is where the Zero Trust principle can demonstrate its strength in defence.

Adopting a Zero Trust security framework offers one way to address this problem. The approach is based on the premise that no user, no system and no interaction is trusted by default. Zero Trust places the emphasis on continuous verification, meaning organisations can monitor all prompts and inputs for unusual behaviour, regardless of their source or perceived legitimacy.

Security practices must therefore be adapted to recognise the particular risks posed by language manipulation. Rather than technical "code problems," organisations must now anticipate malicious input patterns or "problem prompts" that attempt to circumvent protective measures. By embedding security into every part of AI operations — from system architecture to workflow permissions — organisations can face an AI-powered future with greater confidence.

Prompt hacking opens Pandora's box

Prompt hacking is comparable to Pandora's box. Continuously shifting prompts operating in the shadows represent a challenge that organisations deploying AI cannot afford to ignore. As AI systems become increasingly integrated into core business functions, the risks associated with prompt manipulation continue to grow.

This is no longer a theoretical problem — action is required. Organisations bear the responsibility, alongside their adoption of AI tools, of securing them properly. Updating policies, systems and training is necessary at every level.

AI is transforming industries at speed. Competitive advantages will accrue to those organisations that recognise and harness the potential of AI. But to use that potential safely, a proactive defence strategy must be built in from the start. Robust safeguards are essential to protect sensitive data and processes — and to preserve customer trust.

Originally published on SDxCentral · 31 October 2025. Also published in Forbes France and across European security publications.