Pentesting AI Models (Techniques & Checklist)
Introduction
In this article, I’ll share some of the techniques I’ve been trying out to pentest AI models. As these models become more widespread, understanding their vulnerabilities is key — and testing them properly helps us spot potential risks before someone else does.
Along the way, I’ll also provide a practical checklist to guide your own assessments and make sure you cover the important bases.
Owasp Top 10 LLM
The OWASP Top 10 LLM is a focused guide that outlines the most critical security risks related to Large Language Models. It helps researchers and testers identify common vulnerabilities and prioritize efforts to secure AI systems effectively. Knowing the OWASP Top 10 LLM vulnerabilities is essential when pentesting AI models because it helps focus testing on the most impactful and common security risks.
LLM01:2025
Prompt Injection
Malicious input that manipulates the model’s behavior, altering responses or bypassing filters.
Unauthorized or harmful content generation
LLM02:2025
Sensitive Information Disclosure
Risk of exposing personal, financial, or confidential data through model outputs or training data leaks.
Data breaches, privacy violations, IP loss
LLM03:2025
Supply Chain
Vulnerabilities in third-party models, training data, and deployment platforms affecting integrity.
Biased outputs, security breaches, system failures
LLM04:2025
Data and Model Poisoning
Manipulation of training or fine-tuning data to introduce biases, backdoors, or vulnerabilities.
Degraded performance, biased/toxic outputs, backdoors
LLM05:2025
Improper Output Handling
Insufficient validation or sanitization of model outputs before use, risking injection or escalation attacks.
XSS, CSRF, SSRF, privilege escalation, remote code exec
LLM06:2025
Excessive Agency
Too much autonomy or permission granted to LLM agents, risking harmful actions from manipulated or faulty outputs.
Confidentiality, integrity, and availability risks
LLM07:2025
System Prompt Leakage
Exposure of system prompts containing sensitive info or guardrails, leading to possible further attacks.
Enables bypass of security controls and privilege escalation
LLM08:2025
Vector and Embedding Weaknesses
Vulnerabilities in vector/embedding generation and handling in RAG systems that may lead to manipulation or data leaks.
Harmful content injection, output manipulation, data exposure
LLM09:2025
Misinformation
Generation of false or misleading content due to hallucinations, biases, or incomplete data.
Security breaches, reputational damage, legal liability
LLM10:2025
Unbounded Consumption
Excessive uncontrolled inferences causing service disruption, resource depletion, or model theft.
Denial of service, economic losses, IP theft
Techniques
Checklist
Last updated
