Find Shadow AI Usage with Snyk AI-BOM Scanner: Try it Now

Note: This is an experimental feature, subject to breaking changes without notice.

Starting today, Snyk AI-BOM is available for Snyk users to try. This tutorial will walk you through how to access it using the ai-bom-scan GitHub repository. This repository features a Python script that uses the Snyk AI-BOM API to scan every Snyk target repository in your organization and find any that mention deepseek (or any other AI model, library, or keyword you choose). We will build it piece by piece, explaining each API call along the way.

What is an AI-BOM?

An AI Bill of Materials (AI-BOM) is a comprehensive inventory of AI components used in a software project. Similar to traditional software BOMs that list dependencies and libraries, AI-BOMs specifically focus on:

AI frameworks (PyTorch, TensorFlow, etc.)
Large Language Models (OpenAI GPT, Anthropic Claude, etc.)
AI-specific libraries and tools

With Snyk’s AI-BOM, teams get the visibility they need to track and manage these AI components.

How the AI-BOM scanner works

The scanner follows a systematic approach to discover AI components across your organization.

Step 1: Authentication and Setup

The tool begins by validating your Snyk credentials and organization access:

1# Environment validation
2SNYK_API_URL = os.getenv("SNYK_API_URL", "https://api.snyk.io")
3SNYK_ORG_ID = os.getenv("SNYK_ORG_ID")
4SNYK_TOKEN = os.getenv("SNYK_TOKEN")
5
6if not all([SNYK_ORG_ID, SNYK_TOKEN]):
7   print("Error: Please set SNYK_ORG_ID and SNYK_TOKEN environment variables.")
8   sys.exit(1)

This ensures you have the necessary permissions to access your organization's data through the Snyk API.

Step 2: Target discovery and filtering

The scanner fetches all targets (repositories) from your Snyk organization using pagination to handle large organizations:

1def get_all_targets(self):
2   targets = []
3   url = f"{self.api_url}/rest/orgs/{self.org_id}/targets?version={self.api_version}&limit=100"
4
5   while url:
6       response = requests.get(url, headers=self.headers)
7       data = response.json()
8       targets.extend(data.get('data', []))
9
10       # Handle pagination
11       next_link = data.get('links', {}).get('next')
12       url = f"{self.api_url}{next_link}" if next_link else None
13
14   return targets

The tool then filters targets to focus only on source code management systems supported such as GitHub and GitLab (see full list here). Container images and manual uploads are automatically skipped since AI-BOMs are most relevant for source code repositories.

Step 3: AI-BOM generation

For each compatible target, the scanner creates an AI-BOM generation job:

1def process_target(self, search_keyword, target):
2   # 1. Create the AI-BOM Job
3   post_url = f"{self.api_url}/rest/orgs/{self.org_id}/ai_boms?version={self.api_version}"
4   payload = {
5       "data": {
6           "type": "ai_bom_scm_bundle",
7           "attributes": {"target_id": target_id}
8       }
9   }
10   response = requests.post(post_url, headers=self.headers, json=payload)

What happens during AI-BOM generation:

Snyk analyzes the repository's source code
Identifies AI-related dependencies and imports
Catalogs model references and AI framework usage
Creates a comprehensive inventory of AI components

Step 4: Job polling and completion

AI-BOM generation is an asynchronous process, so the scanner polls for job completion:

1# 2. Poll for Job Completion
2while status not in ["finished", "errored"]:
3   time.sleep(2)  # Be respectful to the API
4   response = requests.get(job_url, headers=self.headers, params={'version': self.api_version})
5   response_data = response.json()
6   status = response_data['data']['attributes']['status']

Why polling is necessary: Generating an AI-BOM requires analyzing the entire repository, which can take time depending on the repository size and complexity. The scanner waits for completion before proceeding.

Step 5: Keyword search and analysis

Once the AI-BOM is ready, the scanner searches for your specified keywords:

1# 3. Get the final AI-BOM and search for the keyword
2final_response = requests.get(job_url, headers=self.headers, params={'version': self.api_version})
3bom_content = final_response.text
4
5# Split search terms by comma and check for matches
6search_terms = [term.strip().lower() for term in search_keyword.split(',')]
7bom_content_lower = bom_content.lower()
8
9matched_terms = []
10for term in search_terms:
11   if term in bom_content_lower:
12       matched_terms.append(term)

Search capabilities:

Multiple terms: Use comma-separated values for OR logic
Case-insensitive: Searches are performed in lowercase
Partial matching: Finds terms anywhere in the AI-BOM content

Usage example

Search for targets using deepseek, openai or anthropic models:

1> ai-bom-scan "deepseek,openai,anthropic"
2Starting scan to find targets using any of: 'deepseek', 'openai', 'anthropic'...
3Found 45 total targets in the organization.
4
5Scan Complete
6==================================================
7✅ Found matches in 8 targets:
8  • my-org/ml-project (openai)
9  • my-org/chatbot-service (openai,anthropic)
10  • my-org/ai-experiments (deepseek)
11  • my-org/content-generator (openai)
12  • my-org/voice-assistant (anthropic)
13  • my-org/smart-recommendations (openai,deepseek)
14  • my-org/language-tools (anthropic)
15  • my-org/research-prototype (deepseek,openai)
16==================================================

Common use cases

1. AI Framework Auditing

Identify which projects use specific AI frameworks:

ai-bom-scan "pytorch,tensorflow,keras"

Why this matters: Understanding framework distribution helps with:

License compliance
Security vulnerability management
Technology standardization efforts

2. LLM provider discovery

Find repositories using specific language model providers:

ai-bom-scan "openai,anthropic,cohere,huggingface"

Why this matters: Understanding specific language model providers helps with:

Track API usage and costs
Ensure compliance with usage policies
Plan migration strategies

3. Security scanning

Identify potentially vulnerable AI components:

ai-bom-scan "langchain,llamaindex,transformers"

Why this matters: Understanding specific AI components helps with security considerations:

Some AI libraries have known vulnerabilities
AI-BOMs help track and remediate security issues
Enables proactive security management

Conclusion

The Snyk AI-BOM Scanner uses the AI-BOM API to provide visibility into AI component usage across your organization. By automating the discovery and cataloging of AI dependencies, it enables better security, compliance, and technology management.

Whether you're conducting security audits, ensuring license compliance, or planning technology migrations, the Snyk AI-BOM Scanner provides the insights you need to make informed decisions about your organization's AI landscape.

Snyk users can access the Snyk AI-BOM Scanner here.

And if you’d like to get updates on future incubations from Snyk or become a design partner, please submit your info here!