Glossary of Threats

Adversarial Attacks

Backdoor attacks

A hacker injects "backdoors" to their victim’s ML models during model training to train the model with poisoned data in order to manipulate outputs, steal and hijack entire systems.

Data Stealing

The unauthorized acquisition of data with the intention of misusing it through methods like hacking, phishing, malware, data breaches, and insider threats.

Denial of Service (DoS)

An adversarial attack that shuts down machines or networks to prevent them from functioning normally.

Evasion

The most common adversarial attack on ML and AI models performed during inference. An evasion happens by designing an input that seems normal, but is wrongly classified by ML or AI models. For example, an attacker could mislead an LLM by adding universal adversarial prefixes to their prompt, which tricks the LLM into processing manipulated requests and potentially accessing restricted information.

Model Stealing

A technique that allows adversaries to create models that imitate the functionality of black-box (defined below under Cyber Attacks) ML models. The attacker then queries the stolen models to gain insights without accessing the original models.

Poisoning

Involves manipulating a training dataset by introducing, modifying, or deleting specific data points. Attackers poison data to introduce biases, errors, or vulnerabilities into ML models, negatively impacting their decisions or predictions during deployment.

Gen AI/LLM Risks

Hallucination

When a model generates misleading, nonsensical or incorrect output. Models often lack the capacity to respond "I don't know" and instead generate false information with unwavering confidence.

Jailbreak

Written phrases and creative prompts that bypass or trick a model's safeguards to draw out prohibited information that would otherwise be blocked by content filters and guidelines. Unlike Prompt Injection, which aims at the system outputs, Jailbreaking seeks to compromise the model's alignment.

PII

Also known as Personally Identifiable Information, PII risks violating privacy laws or stipulations when used to prompt or train GenAI models. This not only risks the deletion of data containing PII, but also entire models.

Prompt Injection

Meant to elicit unintended behaviors, Prompt Injections are attacks that impact outputs like search rankings, website content and chatbot behaviors. A Direct Prompt Injection is intentional, while an Indirect one involves a user who unknowingly "injects" commands and text.

Toxicity

When LLMs produce manipulative images and text, potentially leading to the spread of disinformation and other harmful consequences.

Trustworthiness

Data Drifts

When the accuracy of AI models drifts within days. Business KPIs are negatively affected when production data differs from training data.

Biases

Refer to AI systems that produce biased results, usually reflective of human societal biases.

Explainability

The level at which human users are able to comprehend and trust the results created by AI, based on tracking how the AI made decisions and reached conclusions.

Fairness

In the context of AI, fairness is the process of correcting and eliminating algorithmic biases (about race, ethnicity, gender, sexual orientation, etc.).

Out-of-Distribution (OOD)

Data that deviates from patterns AI models were trained on, which leads models to behave in unexpected ways.

Weak Spots

AI can have several weak spots which may be exploited and pose risks.

Cyber Attacks on AI

Blackbox or Graybox

When attackers have either partial or no knowledge of a model, other than its input.

Malicious Code

A breed of code that can be used to corrupt files, erase hard drives, or allow attackers to access systems. Malicious code includes trojan horses, worms, and macros, and spreads by visiting infected websites or downloading infected attachments or files.

Malware Injection

When malware is injected into an established software program, website, or database using methods like SQL injection and command injection.

White Box

Also known as XAI attacks, this is when attackers know everything about the deployed model, e.g., inputs, model architecture, and specific model internals like weights or coefficients. Compared to blackbox attacks, white box attacks allow attackers more opportunity to gain information once they are able to access network gradients of XAI models.

Trojan

Attacks that embed malicious code within training datasets and updates which seem benign. After attacking the AI system, these hidden payloads manipulate the model’s decision-making, causing data exfiltration and output poisoning.

Privacy

Data Extraction

When trained attack models are tasked with determining whether a data point is in a training set that can expose essential information like private API keys.

Model Inversion

An attack in which a machine learning model - an inversion model - is trained on the target model's output to predict the original dataset of the target model and infer sensitive information.

Private Data Leakage

When an LLM discloses information that should have remained confidential, leading to privacy and security breaches.

Book a Demo