Python · Detection Engineering

SigmaForge — Sigma Rule Writer, Validator, and Multi-Backend Converter

A CLI tool that wraps the pySigma ecosystem to validate, inspect, and convert Sigma detection rules to SIEM query languages during the authoring loop

← Back to projects

SigmaForge

Sigma rule writer, validator, and multi-backend converter — for detection engineers who want to actually understand what their rules do.


The Goal

I wanted a faster inner loop when authoring Sigma rules. The official sigma-cli is solid, but it's heavier than what I need when I'm iterating on a detection and just want a quick "does this parse, and what does it look like in Splunk?" answer.

SigmaForge does three things and only three things:

  1. Validate — parse a rule against the official Sigma spec and surface syntax or structural errors before they hit a pipeline or a SIEM
  2. Convert — translate a rule to a target query language (Splunk SPL and Elastic Lucene in v0.1)
  3. Inspect — print a human-readable summary of what a rule actually does without hand-reading the YAML

The exit codes (0 valid, 1 invalid, 2 usage error) are intentional — the tool is meant to sit in a pre-commit hook or a CI step that validates every rule file in a detection repo.


How It Works

The core of the tool is a thin layer on top of pySigma. The two main modules are validator.py and converter.py.

Validation

validate_rule() loads a rule file through SigmaCollection.from_yaml(), which is pySigma's own parser. If that raises a SigmaError or a YAML parse error, the rule is invalid. If it passes, a second pass with yaml.safe_load extracts the metadata — title, ID, status, logsource, detection summary — and packages it into a ValidationResult dataclass for the CLI to render.

The logsource and detection summaries are assembled from the raw YAML keys rather than pySigma's object model. This keeps the inspection output stable even if pySigma changes its internal representations.

Conversion

convert_rule() is simpler: load the file into a SigmaCollection, pick a backend (SplunkBackend or LuceneBackend), call backend.convert(), and return the list of query strings. Sigma supports multi-document YAML, so a single file can contain multiple rules — the converter handles that transparently and returns one query per rule.

CLI

The CLI is built with Click. The three subcommands (validate, convert, inspect) share the same argument shape — a rule file path plus an optional --target for convert. Output is designed to be readable at a glance in a terminal and parseable by a human reviewer in a CI log.


Capabilities

  • Validate a single Sigma rule and get a pass/fail with structured error output
  • See a rule's logsource, detection conditions, status, and ID without reading YAML
  • Convert to Splunk SPL or Elastic Lucene query language
  • CI-friendly exit codes for integration into pre-commit hooks or GitHub Actions
  • Multi-document YAML support (one file, multiple rules)

Tech & Tools

  • Python 3.10+
  • pySigma — official Sigma parsing and backend framework (sigma-backend-splunk, sigma-backend-elasticsearch)
  • Click — CLI argument parsing and subcommand dispatch
  • PyYAML — secondary parse pass for metadata extraction
  • pytest — unit tests covering validation, conversion, and CLI behavior

What I Learned

On the Sigma ecosystem:

  • SigmaCollection.from_yaml() is the authoritative validator — it catches structural errors that a plain YAML parser misses entirely, like a missing condition key or an invalid logsource field
  • The pySigma backend model separates parsing from output: the same collection object goes into any backend, which makes adding new targets straightforward
  • Multi-document Sigma YAML is real and shows up in some rule repos; treating single-rule files as the only case would silently break on them

On CLI design for security tooling:

  • Exit codes matter more than output format when the primary consumer is a CI pipeline — getting 0/1/2 right from the start makes integration painless
  • Separation between the library layer (validator.py, converter.py) and the CLI layer (cli.py) made it easy to write unit tests without spawning subprocesses or mocking Click internals

On building a detection portfolio:

  • This tool sits at the end of a chain: LogHound identifies anomaly patterns in raw logs, and SigmaForge is where those patterns get expressed as portable, reviewable, version-controlled detection logic — the format that survives a SIEM migration
  • Writing the example rules (ssh_failed_password.yml, suspicious_sudo.yml, web_scanner_useragent.yml) to mirror real detections from my other projects made the tool feel immediately useful rather than theoretical