Cybersecurity 101 Categories
What is data poisoning?
Data poisoning is a type of attack in machine learning or data-driven systems where malicious or incorrect data is intentionally inserted into a dataset to corrupt the training process or skew results. The goal is to manipulate the behavior of the system — either by degrading its overall performance or making it behave in a specific, undesirable way.
There are a couple of common types:
- Availability Attacks: The attacker poisons the data to make the model perform poorly across the board (e.g., making it unreliable or inaccurate).
- Integrity Attacks (a.k.a. Backdoor Attacks): The attacker injects specific examples into the training data so that the model behaves normally most of the time, but fails (or outputs a specific result) when it sees certain triggers.
Examples:
- In a spam filter, a data poisoning attack could involve labeling spam emails as “not spam” in the training data, so the model starts allowing real spam through.
- In facial recognition, an attacker could try to poison the system so that it misidentifies them as someone else when they wear certain glasses or accessories.
Why It Matters:
Data poisoning is especially concerning in situations where:
- Training data is crowdsourced or publicly available.
- Systems retrain automatically over time with incoming data.
- The model is used in high-stakes environments like healthcare, finance, or security.
What is the difference between model poisoning and data poisoning?
model poisoning and data poisoning are closely related but happen at different stages and in different ways within a machine learning pipeline.
Data Poisoning
- Where it happens: During the training data preparation phase.
- How it works: An attacker inserts malicious or mislabeled data into the training dataset.
- Goal: To influence the behavior of the model when it’s trained on that tampered data.
Example:
In a sentiment analysis model, an attacker might add fake reviews where negative comments are labeled as “positive.” The model then learns incorrect associations, leading to faulty predictions.
Model Poisoning
- Where it happens: During model training, especially in federated learning or distributed training setups.
- How it works: A participant in the training process (e.g., a device or server) intentionally submits harmful model updates.
- Goal: To corrupt the global model directly — either to degrade performance or insert a backdoor.
Example:
In federated learning across mobile devices, one compromised device could send deliberately poisoned model updates (weights) that skew the final global model.
What is an example of data poisoning?
Scenario: Email Spam Filter
Let’s say you’re using a machine learning model to filter phishing emails. The model is trained on a dataset of emails labeled as either “phishing” or “not phishing.”
Data Poisoning Attack
An attacker finds a way to inject poisoned data into the training set — perhaps by submitting feedback or training data to an open-source phishing filter platform.
They do something like this:
- Submit many phishing emails that are intentionally labeled as “not phishing.”
- These emails might contain common phishing characteristics: shady links, sketchy language (“Congratulations! You won $$$”), etc.
Result
When the model retrains on this poisoned dataset:
- It starts associating typical phishing patterns with safe emails.
- Phishing emails with those same characteristics may now slip past the filter.
- The model’s performance declines, and spam gets into users’ inboxes.
Real-World Relevance
- In open-source or crowd-sourced platforms, poisoning is a real risk.
- It’s also relevant in social media moderation, recommendation engines, or any system where user-submitted content is used to train or retrain the model.
How can I prevent data poisoning?
Preventing data poisoning is super important, especially in systems where data is collected from external sources or retraining happens automatically. Here’s a breakdown of practical strategies to prevent or mitigate data poisoning attacks:
1. Data Validation & Cleaning
- What to do: Rigorously check incoming data for outliers, duplicates, mislabeled entries, or patterns that don’t make sense.
- Tools: Use anomaly detection algorithms, manual review for high-impact data, and data consistency checks.
- Why it helps: Most poisoned data looks suspicious — catching it early reduces risk.
2. Monitor for Label Noise
- What to do: Watch for inconsistencies between the input data and its label (e.g., a clearly spammy message labeled “not spam”).
- How: Use automated tools to flag suspicious samples and run secondary checks.
- Why it helps: Poisoning often relies on incorrect labels to trick the model.
3. Limit User-Contributed Training Data
- What to do: Avoid automatically trusting user-generated input for retraining unless verified.
- Why it helps: Attackers can easily poison systems that retrain on unchecked user feedback (e.g., thumbs-up/down data, product reviews).
4. Access Control
- What to do: Restrict who can upload or influence training data. Use authentication, rate limiting, and logging.
- Why it helps: It reduces the risk of an attacker flooding the system with bad data.
5. Robust Training Techniques
- Differential Privacy: Limits the impact of individual data points.
- Robust Statistics: Use training algorithms less sensitive to outliers or small changes in data.
- Noise-Tolerant Models: Some models are designed to be more resilient to mislabeled data.
6. Data Provenance & Versioning
- What to do: Track where each piece of training data comes from and log all changes.
- Why it helps: If something goes wrong, you can trace it back to the source and fix it.
7. Model Testing & Auditing
- What to do: Continuously test models on known benchmarks and stress-test inputs.
- Why it helps: It helps you detect unexpected behavior that might come from poisoning.
8. Use Human-in-the-Loop Systems
- For high-risk or critical systems, include human review steps before training or deployment.
What is an AI poisoning attack?
An AI poisoning attack (or data poisoning attack) is when an attacker intentionally manipulates the training data of a machine learning model to cause it to behave incorrectly or maliciously once deployed.
How It Works:
AI and ML models learn by finding patterns in training data. If you feed them bad or biased data, they learn bad or biased behavior.
In a poisoning attack, an adversary:
- Injects malicious, misleading, or corrupted data into the training dataset.
- The model learns from this tampered data.
- When used in the real world, the model may:
- Make incorrect predictions
- Fail at specific tasks
- Be vulnerable to exploitation
Types of AI Poisoning Attacks:
1. Targeted Poisoning
- Designed to make the model fail on specific inputs.
- Example: Poison facial recognition so it misidentifies one person as another.
2. Indiscriminate Poisoning
- Causes general degradation of the model’s performance.
- Example: Lowering spam filter accuracy by injecting mislabeled emails.
3. Backdoor Attacks
- The model behaves normally, except when triggered by a specific input (like a keyword or image).
- Example: A self-driving car ignores stop signs with a certain sticker pattern.
Why It Matters:
- AI poisoning can compromise:
- Security systems (e.g., malware detection, facial recognition)
- Healthcare diagnostics
- Financial fraud detection
- Autonomous vehicles
- It’s often hard to detect — poisoned data can look legit.
Defense Against It:
- Data validation and sanitization
- Robust learning algorithms
- Model auditing and explainability tools
- Provenance tracking for datasets