Tools

Poltergeist

Poltergeist is Ghost Security Agent's secret scanner. It scans source code for leaked API keys, tokens, certificates, and credentials using a dual-engine architecture that combines speed with precision.


Architecture

Dual-engine design

Poltergeist uses two regex engines and selects the best one automatically:

Hyperscan engine -- a high-performance multi-pattern matcher. It evaluates all rules simultaneously in a single pass over the file content, maintaining consistent scan times regardless of rule count. With 100 rules, Hyperscan scans the Linux kernel (1.4 GB) in about 8 seconds.

Go regex engine -- a fallback engine for environments where Hyperscan isn't available, and the default for single-pattern scans. Performance scales linearly with rule count.

In auto mode (the default), Poltergeist uses Hyperscan for multi-pattern scans when available, and Go regex for single patterns or when Hyperscan isn't installed.

Entropy analysis

Every match is evaluated for Shannon entropy (a measure of randomness). Each rule defines a minimum entropy threshold tuned to its specific pattern. Matches below the threshold are filtered out by default.

For example:

  • A generic password variable (ghost.generic.3) has a threshold of 3.5 bits, because passwords can be relatively short
  • An AWS session token (ghost.aws.2) has a threshold of 5.5 bits, because these tokens are long base64 strings with high randomness
  • An OpenAI API key (ghost.openai.1) has a threshold of 5.1 bits

The -low-entropy flag shows matches below threshold, useful for debugging rules or investigating potential issues.

Automatic redaction

Poltergeist redacts secrets in its output by default. Each rule defines how much of a match to reveal (prefix and suffix character counts), with the middle replaced by asterisks:

sk-proj-0JdlOY****hDvSYA    (OpenAI key: 13 prefix, 4 suffix)
Bu/9****KBBJ                  (AWS secret: 4 prefix, 4 suffix)

Scan output is safe to share, log, or include in reports without exposing actual credential values.


Key features

  • 100 built-in rules covering 50+ services across cloud providers, AI services, git platforms, CI/CD, communication tools, databases, payment processors, and more
  • Dual-engine architecture with automatic engine selection
  • Entropy filtering to reduce false positives from low-randomness matches
  • Automatic redaction of secrets in all output formats
  • Multiple output formats -- text (colored), JSON (machine-readable), Markdown (reports)
  • Custom rules -- extend with your own YAML rule files
  • Binary file detection -- automatically skips binary files, archives, and media
  • Embedded rules -- rules compiled into the binary, no external files needed at runtime

CLI reference

Usage

poltergeist [options] <path> [pattern1] [pattern2] ...

Scans the file or directory at <path>. Optionally provide one or more regex patterns to match in addition to (or instead of) built-in rules.

Flags

FlagDefaultDescription
-engineautoPattern matching engine: auto, go, or hyperscan
-rules--Path to YAML rule file or directory of rule files
-formattextOutput format: text, json, or md
-output--Write output to file (auto-detects format from .json or .md extension)
-dnrfalseDo not redact: show full secret values
-low-entropyfalseShow matches below entropy threshold
-no-colorfalseDisable colored text output
-version--Show version information
-help--Show usage information

Examples

# Scan a directory with default rules
poltergeist /path/to/code

# JSON output for CI/CD integration
poltergeist -format json /path/to/code

# Markdown report to file
poltergeist -output report.md /path/to/code

# Custom rules
poltergeist -rules ./my-rules.yaml /path/to/code

# Force Hyperscan engine
poltergeist -engine hyperscan /path/to/code

# Show low-entropy matches for investigation
poltergeist -low-entropy /path/to/code

# Combine custom rules with inline patterns
poltergeist -rules ./rules /path/to/code "api[_-]?key\s*[:=]\s*['\"]([^'\"]+)"

Output formats

Text (default) -- colored, human-readable output grouped by file:

SCAN SUMMARY
Files scanned: 1,247
Total content: 48 MB
Secrets found: 3

src/config/api.ts
  Line 15: OpenAI API Key
    sk-proj-0JdlOY****hDvSYA
    ID: ghost.openai.1
    Entropy: 5.2 | Threshold: 5.1 | Met: Yes

Duration: 0.8s

JSON -- structured output for programmatic consumption:

json
{
  "summary": {
    "files_scanned": 1247,
    "files_skipped": 23,
    "total_bytes": 50331648,
    "matches_found": 3,
    "high_entropy_matches": 3,
    "low_entropy_matches": 0
  },
  "results": [
    {
      "file_path": "src/config/api.ts",
      "line_number": 15,
      "redacted": "sk-proj-0JdlOY****hDvSYA",
      "rule_name": "OpenAI API Key",
      "rule_id": "ghost.openai.1",
      "entropy": 5.2,
      "rule_entropy_threshold": 5.1,
      "rule_entropy_threshold_met": true
    }
  ]
}

Markdown -- report format with tables and findings sections, suitable for documentation or issue tracking.

Exit codes

  • 0 -- scan completed, no secrets found
  • 1 -- scan completed, secrets were found (or output is JSON/Markdown)

Rule format

Rules are defined in YAML. You can create custom rules to detect organization-specific secrets or patterns:

yaml
rules:
  - name: Internal API Key
    id: custom.internal.1
    description: Internal service API key format.
    tags: [api, internal]
    pattern: |
      (?x)
      \b(int-[a-zA-Z0-9]{32})\b
    entropy: 4.0
    redact: [8, 4]
    tests:
      assert:
        - "int-aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2u"
      assert_not:
        - "int-test"
    history:
      - 2024-01-15 Initial rule

Rule fields:

FieldRequiredDescription
nameYesHuman-readable rule name
idYesUnique identifier (e.g., ghost.openai.1)
descriptionYesUser-facing explanation
tagsYesCategories for filtering and organization
patternYesRegex pattern (supports extended (?x) syntax with comments)
entropyYesMinimum Shannon entropy threshold
redactYes[prefix_chars, suffix_chars] for output redaction
testsYesassert (should match) and assert_not (should not match) test cases
historyYesChangelog entries (at least one required)

Skill integration

When used through the scan-secrets skill, Poltergeist's JSON output feeds into AI context assessment. The skill parses matches into candidates, assesses each one (real vs. placeholder, hardcoded vs. environment variable, production vs. test), and writes confirmed findings with severity assessments and remediation guidance. See Secret scanning for details.

Previous
Reporting
Next
Wraith