Python · Detection Engineering

LogHound — CLI Log Anomaly Detection

A Python CLI that parses auth and web server logs to surface brute force attacks, credential stuffing, privilege escalation, and scanner behavior before they become incidents

python log-analysis detection-engineering ssh brute-force

← Back to projects

LogHound — CLI Log Anomaly Detection

Built this because I kept wishing I had a fast way to drop a log file in and immediately know if something bad happened — without spinning up a full SIEM.

The Goal

I wanted a tool that would:

Parse raw auth.log and nginx/apache access logs without any setup overhead
Apply real detection logic — not just grep — to surface patterns that matter
Map findings to MITRE ATT&CK so the output is useful beyond just the terminal
Stay CLI-native: pipe-friendly, scriptable, JSON-exportable

The target use case was quick triage on a box I'd just gotten access to, or validating that a sample log file actually contained what I thought it did.

How It Works

LogHound takes a log file and a log type (auth or web), runs it through the appropriate parser, then passes the normalized events through detection logic that looks for specific patterns.

The architecture is deliberately simple. There's no database, no daemon, no config file. You point it at a file and it tells you what it found. The --since flag lets you scope the window (e.g. --since 24h) so you're not wading through months of noise when you only care about the last day.

For auth logs, the parser extracts structured events — failed logins, successful logins, sudo invocations, user creation — and the detectors correlate across them. Brute force detection counts failed attempts per source IP over the full window, then cross-references against successful logins from the same IP. A login after 8 failures from the same address comes out CRITICAL with type brute_force_success, not just a generic alert.

For web logs, the parser handles Combined Log Format (nginx and Apache both use it), extracting IP, method, path, status code, and User-Agent. Detection runs regex against UA strings for known scanner tools and checks paths against a list of sensitive targets like .env, .git/config, and wp-config.php.

Output is colored by severity in the terminal (CRITICAL in bright red, HIGH in red, MEDIUM in yellow) and structured cleanly enough to pipe into other tools. The --no-color flag makes it grep-friendly.

What It Detects

Auth log (/var/log/auth.log, /var/log/secure):

SSH brute force — configurable threshold, defaults to 5 failed attempts — maps to T1110.001
Successful login following multiple failures from the same IP — credential stuffing, T1110.004
High-risk sudo commands (e.g. sudo /bin/bash) — T1548.003
su to root — T1548
New user account creation — persistence indicator, T1136.001
Off-hours logins (outside 08:00–18:00) — T1078 anomaly

Web log (nginx/Apache Combined Log Format):

Known scanner User-Agents: Nikto, sqlmap, nuclei, dirbuster, and others — T1595.002
Sensitive file probing: .env, .git, wp-config.php, etc. — T1083
404 spike patterns indicating directory enumeration — T1595
High request volume from a single IP in the analysis window

The MITRE tagging was intentional. The goal was output you could fold into a SIEM pivot or an IR report without having to look up technique IDs manually.

Tech & Tools

Python 3.8+
argparse for CLI
colorama for terminal color (gracefully degrades if absent)
Regex-based parsing throughout — no external log parsing libraries
JSON output via stdlib json
Sample logs in samples/ for testing without a real server

What I Learned

The most interesting part was designing the detection logic for the brute_force_success case. It's not enough to count failures and count successes independently — you have to correlate them per source IP, and the order matters. A successful login before the failures doesn't mean the same thing as one after. Getting that sequencing right in a single-pass parser required being careful about how I accumulated state across events. The --since time-windowing added another layer: the cutoff has to apply to the correlation window, not just the individual events, otherwise you'd miss a success that fell just inside the window following failures that fell just outside it. Ended up being a good reminder that detection logic is mostly about precisely defining what you're actually looking for before writing a line of code.