Page cover

Pentesting AI Models (Techniques & Checklist)

Introduction

In this article, I’ll share some of the techniques I’ve been trying out to pentest AI models. As these models become more widespread, understanding their vulnerabilities is key — and testing them properly helps us spot potential risks before someone else does.

Along the way, I’ll also provide a practical checklist to guide your own assessments and make sure you cover the important bases.

Owasp Top 10 LLM

The OWASP Top 10 LLM is a focused guide that outlines the most critical security risks related to Large Language Models. It helps researchers and testers identify common vulnerabilities and prioritize efforts to secure AI systems effectively. Knowing the OWASP Top 10 LLM vulnerabilities is essential when pentesting AI models because it helps focus testing on the most impactful and common security risks.

Code
Vuln
Description
Impact

LLM01:2025

Prompt Injection

Malicious input that manipulates the model’s behavior, altering responses or bypassing filters.

Unauthorized or harmful content generation

LLM02:2025

Sensitive Information Disclosure

Risk of exposing personal, financial, or confidential data through model outputs or training data leaks.

Data breaches, privacy violations, IP loss

LLM03:2025

Supply Chain

Vulnerabilities in third-party models, training data, and deployment platforms affecting integrity.

Biased outputs, security breaches, system failures

LLM04:2025

Data and Model Poisoning

Manipulation of training or fine-tuning data to introduce biases, backdoors, or vulnerabilities.

Degraded performance, biased/toxic outputs, backdoors

LLM05:2025

Improper Output Handling

Insufficient validation or sanitization of model outputs before use, risking injection or escalation attacks.

XSS, CSRF, SSRF, privilege escalation, remote code exec

LLM06:2025

Excessive Agency

Too much autonomy or permission granted to LLM agents, risking harmful actions from manipulated or faulty outputs.

Confidentiality, integrity, and availability risks

LLM07:2025

System Prompt Leakage

Exposure of system prompts containing sensitive info or guardrails, leading to possible further attacks.

Enables bypass of security controls and privilege escalation

LLM08:2025

Vector and Embedding Weaknesses

Vulnerabilities in vector/embedding generation and handling in RAG systems that may lead to manipulation or data leaks.

Harmful content injection, output manipulation, data exposure

LLM09:2025

Misinformation

Generation of false or misleading content due to hallucinations, biases, or incomplete data.

Security breaches, reputational damage, legal liability

LLM10:2025

Unbounded Consumption

Excessive uncontrolled inferences causing service disruption, resource depletion, or model theft.

Denial of service, economic losses, IP theft

Techniques

Checklist

Last updated