Poltergeist

Poltergeist is Ghost Security Agent's secret scanner. It scans source code for leaked API keys, tokens, certificates, and credentials using a dual-engine architecture that combines speed with precision.

Architecture

Dual-engine design

Poltergeist uses two regex engines and selects the best one automatically:

Hyperscan engine -- a high-performance multi-pattern matcher. It evaluates all rules simultaneously in a single pass over the file content, maintaining consistent scan times regardless of rule count. With 100 rules, Hyperscan scans the Linux kernel (1.4 GB) in about 8 seconds.

Go regex engine -- a fallback engine for environments where Hyperscan isn't available, and the default for single-pattern scans. Performance scales linearly with rule count.

In auto mode (the default), Poltergeist uses Hyperscan for multi-pattern scans when available, and Go regex for single patterns or when Hyperscan isn't installed.

Entropy analysis

Every match is evaluated for Shannon entropy (a measure of randomness). Each rule defines a minimum entropy threshold tuned to its specific pattern. Matches below the threshold are filtered out by default.

For example:

A generic password variable (ghost.generic.3) has a threshold of 3.5 bits, because passwords can be relatively short
An AWS session token (ghost.aws.2) has a threshold of 5.5 bits, because these tokens are long base64 strings with high randomness
An OpenAI API key (ghost.openai.1) has a threshold of 5.1 bits

The -low-entropy flag shows matches below threshold, useful for debugging rules or investigating potential issues.

Automatic redaction

Poltergeist redacts secrets in its output by default. Each rule defines how much of a match to reveal (prefix and suffix character counts), with the middle replaced by asterisks:

sk-proj-0JdlOY****hDvSYA    (OpenAI key: 13 prefix, 4 suffix)
Bu/9****KBBJ                  (AWS secret: 4 prefix, 4 suffix)

Scan output is safe to share, log, or include in reports without exposing actual credential values.

Key features

100 built-in rules covering 50+ services across cloud providers, AI services, git platforms, CI/CD, communication tools, databases, payment processors, and more
Dual-engine architecture with automatic engine selection
Entropy filtering to reduce false positives from low-randomness matches
Automatic redaction of secrets in all output formats
Multiple output formats -- text (colored), JSON (machine-readable), Markdown (reports)
Custom rules -- extend with your own YAML rule files
Binary file detection -- automatically skips binary files, archives, and media
Embedded rules -- rules compiled into the binary, no external files needed at runtime

CLI reference

Usage

poltergeist [options] <path> [pattern1] [pattern2] ...

Scans the file or directory at <path>. Optionally provide one or more regex patterns to match in addition to (or instead of) built-in rules.

Flags

Flag	Default	Description
`-engine`	`auto`	Pattern matching engine: `auto`, `go`, or `hyperscan`
`-rules`	--	Path to YAML rule file or directory of rule files
`-format`	`text`	Output format: `text`, `json`, or `md`
`-output`	--	Write output to file (auto-detects format from `.json` or `.md` extension)
`-dnr`	`false`	Do not redact: show full secret values
`-low-entropy`	`false`	Show matches below entropy threshold
`-no-color`	`false`	Disable colored text output
`-version`	--	Show version information
`-help`	--	Show usage information

Examples

# Scan a directory with default rules
poltergeist /path/to/code

# JSON output for CI/CD integration
poltergeist -format json /path/to/code

# Markdown report to file
poltergeist -output report.md /path/to/code

# Custom rules
poltergeist -rules ./my-rules.yaml /path/to/code

# Force Hyperscan engine
poltergeist -engine hyperscan /path/to/code

# Show low-entropy matches for investigation
poltergeist -low-entropy /path/to/code

# Combine custom rules with inline patterns
poltergeist -rules ./rules /path/to/code "api[_-]?key\s*[:=]\s*['\"]([^'\"]+)"

Output formats

Text (default) -- colored, human-readable output grouped by file:

SCAN SUMMARY
Files scanned: 1,247
Total content: 48 MB
Secrets found: 3

src/config/api.ts
  Line 15: OpenAI API Key
    sk-proj-0JdlOY****hDvSYA
    ID: ghost.openai.1
    Entropy: 5.2 | Threshold: 5.1 | Met: Yes

Duration: 0.8s

JSON -- structured output for programmatic consumption:

json

{
  "summary": {
    "files_scanned": 1247,
    "files_skipped": 23,
    "total_bytes": 50331648,
    "matches_found": 3,
    "high_entropy_matches": 3,
    "low_entropy_matches": 0
  },
  "results": [
    {
      "file_path": "src/config/api.ts",
      "line_number": 15,
      "redacted": "sk-proj-0JdlOY****hDvSYA",
      "rule_name": "OpenAI API Key",
      "rule_id": "ghost.openai.1",
      "entropy": 5.2,
      "rule_entropy_threshold": 5.1,
      "rule_entropy_threshold_met": true
    }
  ]
}

Markdown -- report format with tables and findings sections, suitable for documentation or issue tracking.

Exit codes

0 -- scan completed, no secrets found
1 -- scan completed, secrets were found (or output is JSON/Markdown)

Rule format

Rules are defined in YAML. You can create custom rules to detect organization-specific secrets or patterns:

yaml

rules:
  - name: Internal API Key
    id: custom.internal.1
    description: Internal service API key format.
    tags: [api, internal]
    pattern: |
      (?x)
      \b(int-[a-zA-Z0-9]{32})\b
    entropy: 4.0
    redact: [8, 4]
    tests:
      assert:
        - "int-aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2u"
      assert_not:
        - "int-test"
    history:
      - 2024-01-15 Initial rule

Rule fields:

Field	Required	Description
`name`	Yes	Human-readable rule name
`id`	Yes	Unique identifier (e.g., `ghost.openai.1`)
`description`	Yes	User-facing explanation
`tags`	Yes	Categories for filtering and organization
`pattern`	Yes	Regex pattern (supports extended `(?x)` syntax with comments)
`entropy`	Yes	Minimum Shannon entropy threshold
`redact`	Yes	`[prefix_chars, suffix_chars]` for output redaction
`tests`	Yes	`assert` (should match) and `assert_not` (should not match) test cases
`history`	Yes	Changelog entries (at least one required)

Skill integration

When used through the scan-secrets skill, Poltergeist's JSON output feeds into AI context assessment. The skill parses matches into candidates, assesses each one (real vs. placeholder, hardcoded vs. environment variable, production vs. test), and writes confirmed findings with severity assessments and remediation guidance. See Secret scanning for details.