Cybercriminals are increasingly using large language models (LLMs) to generate content for large-scale phishing and scam attacks, Kaspersky’s AI Research Center experts have discovered.

As threat actors attempt to generate fraudulent websites in high volumes, they often leave behind distinctive artifacts – such as AI-specific phrases – that set these sites apart from those created manually.

So far, most phishing examples observed by Kaspersky target users of cryptocurrency exchanges and wallets.

Kaspersky experts analysed a sample of resources, identifying key characteristics that help distinguish and detect cases where AI was used to generate content or even entire phishing and scam websites.

One of the prominent signs of LLM-generated text is the presence of disclaimers and refusals to execute commands, including phrases such as “As an AI language model …”

Another distinctive indicator of language model usage is the presence of concessive clauses, such as: ‘While I can’t do exactly what you want, I can try something similar.” In other examples targeting Gemini and Exodus users, the LLM declines to provide detailed login instructions.

“With LLMs, attackers can automate the creation of dozens or even hundreds of phishing and scam web pages with unique, high-quality content,” explains Vladislav Tushkanov, research development group manager at Kaspersky. “Previously, this required manual effort, but now AI can help threat actors generate such content automatically.”

LLMs can be used to create not just text blocks but entire web pages, with artifacts appearing both in the text itself and in areas like meta tags: snippets of text that describe a web page’s content and appear in its HTML code.

There are other indicators of AI usage in creating fraudulent sites. Some models, for instance, tend to use specific phrases like “delve”, “in the ever-evolving landscape”, and “in the ever-changing world”. While these terms are not considered strong indicators of AI-generated content, they may still be viewed as signs.

Another feature of text generated by a language model is the indication up to which the model’s knowledge of the world extends. The model typically articulates this limitation using phrases such as “according to my last update in January 2023”.

LLM-generated text is often combined with tactics that make phishing page detection more complicated for cybersecurity tools. For instance, attackers may use non-standard Unicode symbols, such as those with diacritics or from mathematical notation, to obfuscate text and prevent matching by rule-based detection systems.

“Large language models are improving, and cybercriminals are exploring ways to apply this technology for nefarious purposes,” says Tushkanov. “However, occasional errors provide insights into their use of such tools, particularly into the growing extent of automation.

“With future advancements, distinguishing AI-generated content from human-written text may become more challenging, making it crucial to use advanced security solutions that analyse textual information along with metadata, and other fraud indicators.”