Google Blocks Massive Model Extraction Campaign Targeting Gemini AI with 100,000+ Malicious Prompts

Google has revealed it detected and blocked a sophisticated campaign involving more than 100,000 prompts designed to extract the proprietary reasoning capabilities of its Gemini AI model, according to the Google Threat Intelligence Group’s latest quarterly threat report.

The Growing Threat of Model Extraction

The coordinated attack represents what security researchers call model extraction or knowledge distillation — a machine-learning technique where attackers attempt to replicate the essential capabilities of a large AI model into a smaller one. Google’s real-time detection systems identified and blocked the prompts, protecting the internal reasoning traces that make Gemini’s capabilities unique.

“Model extraction and subsequent knowledge distillation enable an attacker to accelerate AI model development quickly and at a significantly lower cost,” Google stated in the report. “This activity effectively represents a form of intellectual property theft.”

Attack Methodology Revealed

The attackers employed sophisticated techniques, instructing Gemini to maintain “the language used in the thinking content strictly consistent with the main language of the user input.” This approach was specifically designed to extract Gemini’s reasoning processes across multiple languages, suggesting an attempt to clone the model’s capabilities for non-English markets.

Google noted that the breadth of questions in the campaign indicated a systematic effort to replicate Gemini’s reasoning abilities across a wide variety of tasks and languages — a hallmark of well-funded, state-backed or corporate espionage operations.

DeepSeek Accusations Intensify Concerns

The disclosure comes amid growing industry concerns about AI intellectual property theft. OpenAI simultaneously told US lawmakers that Chinese AI firm DeepSeek has deployed “new, obfuscated methods” to extract results from leading American AI models. OpenAI accused DeepSeek of attempting to “free-ride on the capabilities developed by OpenAI and other US frontier labs,” highlighting how model theft has become a critical concern for companies that have invested billions in AI development.

Nation-State APT Groups Weaponize Gemini

Beyond extraction attempts, Google documented how government-backed threat actors from multiple nations integrated Gemini into their attack operations in late 2025:

Chinese APT31 and UNC795: Automated vulnerability analysis, malware debugging, and exploitation technique research
Iranian APT42: Crafted targeted social engineering campaigns using AI-generated biographical details to build trust with victims
North Korean UNC2970: Gathered intelligence on defense contractors and cybersecurity firms to support phishing campaigns

Google confirmed it disabled accounts and assets associated with these threat groups and used the insights to strengthen defenses against AI misuse.

Malware Now Embeds Gemini APIs

The report also identified a new malware family called HONESTCUE that integrates Gemini’s API directly into its operations. The malware sends seemingly benign prompts to generate working code, which it then compiles and executes in memory — a technique designed to bypass Gemini’s safety filters through prompt fragmentation.

“Integration of public AI models like Google Gemini into malware grants threat actors instant access to powerful LLM capabilities without needing to build or train anything themselves,” noted Pete Luban, Field CISO at AttackIQ. “Malware capabilities have advanced exponentially, allowing for faster lateral movement, stealthier attack campaigns, and more convincing mimicry of typical company operations.”

Defensive Recommendations

For organizations providing AI models as services, Google recommends:

Monitor API access patterns for signs of systematic extraction
Implement strict governance over AI systems and data flows
Deploy response filtering and output controls to prevent attackers from determining model behavior
Conduct continuous testing against realistic adversary behavior

The campaign underscores a fundamental shift in the threat landscape: AI models themselves have become high-value targets for nation-state actors and competitors seeking to shortcut billions of dollars in R&D investment.

Source: CSO Online