Tools
Poltergeist
Poltergeist is Ghost Security Agent's secret scanner. It scans source code for leaked API keys, tokens, certificates, and credentials using a dual-engine architecture that combines speed with precision.
Architecture
Dual-engine design
Poltergeist uses two regex engines and selects the best one automatically:
Hyperscan engine -- a high-performance multi-pattern matcher. It evaluates all rules simultaneously in a single pass over the file content, maintaining consistent scan times regardless of rule count. With 100 rules, Hyperscan scans the Linux kernel (1.4 GB) in about 8 seconds.
Go regex engine -- a fallback engine for environments where Hyperscan isn't available, and the default for single-pattern scans. Performance scales linearly with rule count.
In auto mode (the default), Poltergeist uses Hyperscan for multi-pattern scans when available, and Go regex for single patterns or when Hyperscan isn't installed.
Entropy analysis
Every match is evaluated for Shannon entropy (a measure of randomness). Each rule defines a minimum entropy threshold tuned to its specific pattern. Matches below the threshold are filtered out by default.
For example:
- A generic password variable (
ghost.generic.3) has a threshold of 3.5 bits, because passwords can be relatively short - An AWS session token (
ghost.aws.2) has a threshold of 5.5 bits, because these tokens are long base64 strings with high randomness - An OpenAI API key (
ghost.openai.1) has a threshold of 5.1 bits
The -low-entropy flag shows matches below threshold, useful for debugging rules or investigating potential issues.
Automatic redaction
Poltergeist redacts secrets in its output by default. Each rule defines how much of a match to reveal (prefix and suffix character counts), with the middle replaced by asterisks:
sk-proj-0JdlOY****hDvSYA (OpenAI key: 13 prefix, 4 suffix)
Bu/9****KBBJ (AWS secret: 4 prefix, 4 suffix)
Scan output is safe to share, log, or include in reports without exposing actual credential values.
Key features
- 100 built-in rules covering 50+ services across cloud providers, AI services, git platforms, CI/CD, communication tools, databases, payment processors, and more
- Dual-engine architecture with automatic engine selection
- Entropy filtering to reduce false positives from low-randomness matches
- Automatic redaction of secrets in all output formats
- Multiple output formats -- text (colored), JSON (machine-readable), Markdown (reports)
- Custom rules -- extend with your own YAML rule files
- Binary file detection -- automatically skips binary files, archives, and media
- Embedded rules -- rules compiled into the binary, no external files needed at runtime
CLI reference
Usage
poltergeist [options] <path> [pattern1] [pattern2] ...
Scans the file or directory at <path>. Optionally provide one or more regex patterns to match in addition to (or instead of) built-in rules.
Flags
| Flag | Default | Description |
|---|---|---|
-engine | auto | Pattern matching engine: auto, go, or hyperscan |
-rules | -- | Path to YAML rule file or directory of rule files |
-format | text | Output format: text, json, or md |
-output | -- | Write output to file (auto-detects format from .json or .md extension) |
-dnr | false | Do not redact: show full secret values |
-low-entropy | false | Show matches below entropy threshold |
-no-color | false | Disable colored text output |
-version | -- | Show version information |
-help | -- | Show usage information |
Examples
# Scan a directory with default rules
poltergeist /path/to/code
# JSON output for CI/CD integration
poltergeist -format json /path/to/code
# Markdown report to file
poltergeist -output report.md /path/to/code
# Custom rules
poltergeist -rules ./my-rules.yaml /path/to/code
# Force Hyperscan engine
poltergeist -engine hyperscan /path/to/code
# Show low-entropy matches for investigation
poltergeist -low-entropy /path/to/code
# Combine custom rules with inline patterns
poltergeist -rules ./rules /path/to/code "api[_-]?key\s*[:=]\s*['\"]([^'\"]+)"
Output formats
Text (default) -- colored, human-readable output grouped by file:
SCAN SUMMARY
Files scanned: 1,247
Total content: 48 MB
Secrets found: 3
src/config/api.ts
Line 15: OpenAI API Key
sk-proj-0JdlOY****hDvSYA
ID: ghost.openai.1
Entropy: 5.2 | Threshold: 5.1 | Met: Yes
Duration: 0.8s
JSON -- structured output for programmatic consumption:
{
"summary": {
"files_scanned": 1247,
"files_skipped": 23,
"total_bytes": 50331648,
"matches_found": 3,
"high_entropy_matches": 3,
"low_entropy_matches": 0
},
"results": [
{
"file_path": "src/config/api.ts",
"line_number": 15,
"redacted": "sk-proj-0JdlOY****hDvSYA",
"rule_name": "OpenAI API Key",
"rule_id": "ghost.openai.1",
"entropy": 5.2,
"rule_entropy_threshold": 5.1,
"rule_entropy_threshold_met": true
}
]
}
Markdown -- report format with tables and findings sections, suitable for documentation or issue tracking.
Exit codes
0-- scan completed, no secrets found1-- scan completed, secrets were found (or output is JSON/Markdown)
Rule format
Rules are defined in YAML. You can create custom rules to detect organization-specific secrets or patterns:
rules:
- name: Internal API Key
id: custom.internal.1
description: Internal service API key format.
tags: [api, internal]
pattern: |
(?x)
\b(int-[a-zA-Z0-9]{32})\b
entropy: 4.0
redact: [8, 4]
tests:
assert:
- "int-aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2u"
assert_not:
- "int-test"
history:
- 2024-01-15 Initial rule
Rule fields:
| Field | Required | Description |
|---|---|---|
name | Yes | Human-readable rule name |
id | Yes | Unique identifier (e.g., ghost.openai.1) |
description | Yes | User-facing explanation |
tags | Yes | Categories for filtering and organization |
pattern | Yes | Regex pattern (supports extended (?x) syntax with comments) |
entropy | Yes | Minimum Shannon entropy threshold |
redact | Yes | [prefix_chars, suffix_chars] for output redaction |
tests | Yes | assert (should match) and assert_not (should not match) test cases |
history | Yes | Changelog entries (at least one required) |
Skill integration
When used through the scan-secrets skill, Poltergeist's JSON output feeds into AI context assessment. The skill parses matches into candidates, assesses each one (real vs. placeholder, hardcoded vs. environment variable, production vs. test), and writes confirmed findings with severity assessments and remediation guidance. See Secret scanning for details.