Lab 03: Brute Force and Credential Stuffing Detection

Introduction

Authentication-based attacks are among the most common techniques used by adversaries to gain initial access to systems. Rather than exploiting software vulnerabilities, these attacks target the authentication layer directly — attempting to guess or validate credentials through repeated login attempts.

Two distinct techniques fall under this category. The first is brute force: an attacker selects a single target account and systematically attempts a large number of passwords against it, relying on volume to eventually find a valid credential. The second is credential stuffing: rather than concentrating attempts on one account, the attacker uses a list of usernames and tests each with a small number of passwords. This approach is designed to stay below per-account lockout thresholds, making it harder to detect using rate-based controls tied to individual accounts.

Both techniques are classified under MITRE ATT&CK as T1110 — Brute Force, with credential stuffing specifically mapped to T1110.004. In a SOC environment, detecting these patterns requires understanding not just that authentication failures are occurring, but whether their distribution across accounts and time reveals deliberate attack behaviour.

Wazuh monitors authentication events natively on both Linux and Windows. On Linux, the sshd daemon writes authentication failures to syslog, which Wazuh ingests and evaluates against its ruleset. On Windows, failed logon attempts generate Event ID 4625 in the Security event log, which Wazuh's Windows decoder processes. In both cases, Wazuh ships with built-in rules that detect individual failures as well as aggregated patterns that indicate brute force activity.

The objective of this lab was to:

Deploy a dedicated attacker VM to simulate authentication attacks with a realistic network topology
Simulate brute force attacks against both the Linux and Windows agents
Simulate credential stuffing attacks against both agents
Analyse the resulting alerts in the Wazuh dashboard
Examine the default rule logic to understand precisely what conditions trigger each detection
Identify the limitations of the default ruleset in distinguishing between the two attack patterns

Lab Environment

This lab was conducted within the SOC environment deployed in Lab 01, with one addition.

A dedicated attacker VM was created on the Wazuh server host to provide a clean separation between the attack source and the monitored endpoints. Using the Windows host — which runs a Wazuh agent — as the attack origin would conflate a monitored endpoint with the attacker, making the source IP in alerts ambiguous. A separate VM ensured that alerts in the dashboard reflected a realistic attacker-to-target network path.

Attacker VM

Virtualisation platform: Oracle VirtualBox
Operating System: Ubuntu Server 24.04.4 LTS
vCPU: 1
Memory: 2048 MB
Host: Laptop 1 (12 GB RAM, HDD) — the same host running the Wazuh server VM
Tool installed: Hydra

The attacker VM was kept minimal — no desktop environment was installed. Its sole purpose was to run Hydra against the Xubuntu agent across the local network.

A hardware consideration worth noting: Laptop 1 uses an aging mechanical hard drive that caused installation instability during Lab 01. Running two VMs simultaneously on this host introduced additional disk I/O load. To mitigate this, attack volumes were kept deliberately moderate — sufficient to trigger Wazuh detections without generating excessive log pressure on the Wazuh indexer.

Part 1 — Understanding the Default Authentication Rules

Before simulating any attacks, the relevant default rules were reviewed directly on the Wazuh server to understand what detection logic was already in place and what thresholds would need to be crossed to trigger alerts.

Linux SSH Rules

The SSH ruleset is located at:

/var/ossec/ruleset/rules/0095-sshd_rules.xml

The two rules most relevant to this lab are rule 5710 and rule 5712. Rule 5710 fires at level 5 when an SSH login attempt is made using a username that does not exist on the target system. It triggers on individual events and maps to T1110.001 and T1021.004 in the MITRE framework. Rule 5712 is the aggregation rule that escalates to level 10 when failures exceed a defined threshold. Examining the rule definition revealed the following:

<rule id="5712" level="10" frequency="8" timeframe="120">
    <if_matched_sid>5710</if_matched_sid>
    <same_source_ip />
    <description>sshd: brute force trying to get access to
    the system. Authentication failed.</description>
    <mitre>
        <id>T1110</id>
    </mitre>
</rule>

The rule aggregates failures by <same_source_ip /> — meaning it counts total authentication failures from a single source IP address, regardless of which username is being targeted. Eight failures from the same IP within 120 seconds trigger the level 10 alert.

Windows Authentication Rules

The Windows security ruleset is located at:

/var/ossec/ruleset/rules/0580-win-security_rules.xml

Rule 60122 fires at level 5 on individual Event ID 4625 entries. Rule 60204 is the aggregation rule, which references a shared frequency variable defined at the top of the ruleset:

<var name="MS_FREQ">8</var>

Rule 60204 uses this variable for its frequency threshold with a timeframe of 240 seconds — functionally identical in logic to rule 5712 on Linux, but with a timeframe twice as long.

This asymmetry is worth noting: an attacker pacing credential stuffing attempts at fewer than 8 failures per 4 minutes would evade detection on Windows, whereas the same pacing might still trigger the Linux rule which resets every 2 minutes.

A key observation applicable to both platforms is that the aggregation logic in both rules is source IP based, not username based. This detail becomes significant when analysing the credential stuffing results in Part 4.

Part 2 — Brute Force Simulation: Linux Agent

Setup

A wordlist was created manually on the attacker VM:

nano wordlist.txt

The list contained 20 common passwords, one per line. A deliberately short list was used to keep event volume manageable given the HDD-backed host.

Connectivity to the Xubuntu agent was confirmed before launching the attack:

nc -zv <xubuntu-agent-ip> 22

Attack

Hydra was run from the attacker VM targeting the Xubuntu agent's SSH service with a single non-existent username:

hydra -l fakeuser -P wordlist.txt ssh://<xubuntu-agent-ip> -t 4 -V

The -t 4 flag limited parallel threads to keep the attack rate controlled. The -V flag printed each attempt to the terminal as it occurred.

Dashboard Observations

Alerts appeared in Modules → MITRE ATT&CK within seconds of the attack beginning.

Linux agent — brute force alerts in the Wazuh dashboard

Three distinct rules fired during the simulation. Rule 5710 fired at level 5 on each individual attempt, reflecting the fact that fakeuser does not exist on the Xubuntu system — Wazuh recognised the non-existent username explicitly rather than treating it as a generic authentication failure. Rule 2502 also fired at level 10, capturing the repeated password failure pattern at the syslog level. Rule 5712 then escalated to level 10 after the eighth failure from the attacker IP within the 120-second window, confirming the threshold identified during the rule review in Part 1.

Part 3 — Brute Force Simulation: Windows Agent

Attack Scenario

Hydra's SMB module was tested against the Windows host as an external attack vector. Although port 445 was reachable from the attacker VM, Hydra returned invalid reply errors against the Windows 11 target — a known compatibility issue between Hydra's SMB implementation and the SMBv2/v3 protocol used by modern Windows.

This was documented as a realistic operational finding: external brute force tooling does not always behave cleanly against modern Windows authentication services. To generate the equivalent authentication failure telemetry, a PowerShell script was run locally on the Windows host. This was framed as a post-compromise scenario: an attacker who has already established a foothold on the machine and is attempting to brute force local accounts to escalate privileges or move laterally. In this context the attack genuinely originates from the compromised machine itself.

Attack

Add-Type -AssemblyName System.DirectoryServices.AccountManagement
$context = New-Object System.DirectoryServices.AccountManagement.PrincipalContext(
    [System.DirectoryServices.AccountManagement.ContextType]::Machine)

for ($i = 1; $i -le 20; $i++) {
    $context.ValidateCredentials("fakeuser", "fakepassword$i")
    Write-Host "Attempt $i complete"
    Start-Sleep -Milliseconds 500
}

Each iteration calls ValidateCredentials against the local machine's account database with a username and incrementing password that do not exist. Each failed validation generates a 4625 event in the Windows Security log, which Wazuh ingests via its Windows decoder.

Dashboard Observations

Alerts appeared in Modules → MITRE ATT&CK under the Windows agent within seconds of the script running.

Windows agent — brute force alerts in the Wazuh dashboard

Two rules fired during the simulation. Rule 60122 triggered at level 5 on each individual 4625 event, and rule 60204 escalated to level 10 once the eight-failure threshold was crossed within the 240-second window. The pattern mirrored the Linux results closely — individual failures at level 5 followed by the aggregation rule firing at level 10.

A notable observation was that despite the different log sources, decoders, and operating systems involved, both platforms produced alerts in the same normalised format in the Wazuh dashboard. The analyst view is consistent regardless of the underlying endpoint — the same cross-platform normalisation observed in Lab 02 with FIM events.

Part 4 — Credential Stuffing Simulation

Credential stuffing differs from brute force in its distribution of attempts. Rather than concentrating failures on a single account, attempts are spread across many usernames with a small number of passwords per account — deliberately designed to stay below per-account lockout thresholds.

Linux Agent

A username list was created on the attacker VM:

nano userlist.txt

The list contained 15 common usernames including admin, root, deploy, devops, and others typical of a Linux environment. Hydra was run with the username list replacing the single username flag:

hydra -L userlist.txt -P wordlist.txt ssh://<xubuntu-agent-ip> -t 4 -V

The key difference from the brute force command is -L (uppercase) in place of -l (lowercase), instructing Hydra to iterate through the username list rather than target a single account.

Linux agent — credential stuffing alerts in the Wazuh dashboard

Windows Agent

The same credential stuffing pattern was replicated on the Windows agent using a PowerShell script that iterated across multiple usernames with a limited number of password attempts per account. A slower sleep interval reflected the deliberate low-and-slow pacing characteristic of real credential stuffing:

$usernames = @("admin", "administrator", "john", "sarah", "mike",
               "david", "guest", "test", "deploy", "devops",
               "backup", "sysadmin")
$passwords = @("password123", "welcome1")

Add-Type -AssemblyName System.DirectoryServices.AccountManagement
$context = New-Object System.DirectoryServices.AccountManagement.PrincipalContext(
    [System.DirectoryServices.AccountManagement.ContextType]::Machine)

foreach ($user in $usernames) {
    foreach ($pass in $passwords) {
        $context.ValidateCredentials($user, $pass)
        Write-Host "Tried $user : $pass"
        Start-Sleep -Milliseconds 800
    }
}

Windows agent — credential stuffing alerts in the Wazuh dashboard

Dashboard Observations

The alert pattern produced by the credential stuffing simulation was superficially identical to the brute force simulation — the same rule IDs fired at the same severity levels on both platforms, with the level 10 aggregation rules triggering on both agents. Crucially however, examining the alert details revealed that the aggregation rules fired against individual usernames rather than across the username list as a whole. Each username that received enough attempts to cross the eight-failure threshold triggered its own level 10 alert independently.

This result required careful interpretation. The aggregation rules fired not because they detected the cross-username distribution pattern that defines credential stuffing, but because the combined volume of attempts was sufficient to trigger per-username thresholds for several accounts in the list. The rules were counting failures per source IP within a time window, and where that count crossed eight for a given username, the alert fired.

This is the critical limitation of the default ruleset: the detection logic is volume-based, not pattern-based. An attacker who strictly limited attempts to one or two passwords per username — which is the defining characteristic of true credential stuffing — would generate only level 5 individual failure alerts. As long as their total failure rate stayed below eight attempts per 120 seconds on Linux or 240 seconds on Windows, the aggregation rules would never fire regardless of how many different accounts were probed. The default rules provide no mechanism to detect the cross-account pattern from a single source IP, making it impossible to distinguish a targeted brute force attack from a broad credential stuffing campaign based on rule output alone.

MITRE ATT&CK Mapping

Both attack techniques simulated in this lab fall under T1110 — Brute Force in the MITRE ATT&CK framework, which covers adversary attempts to gain access by guessing credentials without prior knowledge of the correct values.

The sub-techniques observed across the lab map as follows. T1110.001 — Password Guessing covers attempts against a single account with multiple passwords and maps directly to the brute force simulation. This was surfaced by rule 5710 on Linux, which specifically identifies repeated failures against the same username. T1110.004 — Credential Stuffing covers attempts using lists of usernames paired with commonly used passwords. While this sub-technique describes the credential stuffing simulation accurately at the framework level, it did not appear as a distinct classification in the Wazuh dashboard — both simulations produced alerts mapped to T1110 at the parent technique level. This reflects the same limitation identified in the dashboard observations: the default rules detect authentication failure volume but do not classify the distribution pattern that distinguishes T1110.001 from T1110.004.

T1021.004 — Remote Services: SSH was captured by rule 5710 on the Linux agent, reflecting the use of SSH as the remote access vector. T1531 — Account Access Removal was mapped by rule 60122 on Windows for individual 4625 events, reflecting the potential impact of repeated failed logons on account availability through lockout mechanisms.

The MITRE mapping reinforces the analytical finding from Part 4. T1110.001 and T1110.004 are distinct sub-techniques with different operational profiles, but the default Wazuh ruleset does not surface that distinction. A more complete detection posture would require a custom rule capable of identifying when a single source IP produces failures distributed across a large number of distinct usernames — independent of whether the per-account volume is high enough to trigger the existing aggregation rules.

Lessons Learned

Wazuh's default authentication rules provide reliable detection of volumetric attacks but aggregate failures by source IP rather than by username distribution. Both brute force and credential stuffing produce identical alert signatures, making the two techniques indistinguishable from dashboard output alone.
The aggregation timeframe differs between platforms — 120 seconds on Linux versus 240 seconds on Windows. This asymmetry means an attacker pacing attempts at a consistent rate may trigger detection on one platform while evading it on the other.
T1110.004 — Credential Stuffing did not appear as a distinct classification in the Wazuh dashboard despite the credential stuffing simulation matching that technique's definition. The default rules map authentication failures to T1110 at the parent level, with sub-technique granularity requiring custom rule logic to surface.
Hydra's SMB module is unreliable against modern Windows targets due to SMBv2/v3 compatibility issues. Local simulation via PowerShell provides equivalent telemetry for detection testing purposes and maps naturally to a post-compromise attack scenario where an attacker with an existing foothold attempts to escalate privileges through local account brute force.
Examining the raw rule definitions before running simulations provided a precise understanding of what conditions would trigger each alert. This approach — reviewing detection logic before testing it — is more methodical than inferring rule behaviour from dashboard output after the fact.
The MITRE ATT&CK framework distinguishes T1110.001 and T1110.004 as separate sub-techniques with different operational profiles. The default Wazuh ruleset does not surface this distinction, and bridging that gap requires custom detection logic that considers username distribution rather than failure volume alone.

Conclusion

This lab demonstrated the simulation and detection of brute force and credential stuffing attacks across both Linux and Windows agents using Wazuh's default authentication monitoring capabilities.

Attacks were launched from a dedicated attacker VM to maintain clean separation between the attack source and monitored endpoints. Brute force was simulated using Hydra against the Linux SSH service and via a PowerShell authentication loop on Windows, with the latter framed as a post-compromise privilege escalation scenario following compatibility issues with Hydra's SMB module against a Windows 11 target.

The credential stuffing simulation produced the same alert signatures as the brute force simulation on both platforms. Investigating the underlying rule definitions revealed why: both rule 5712 on Linux and rule 60204 on Windows aggregate failures by source IP rather than by target username, meaning the cross-account distribution pattern that defines credential stuffing is not visible to either rule. The level 10 alerts fired during credential stuffing only because per-username failure volume happened to cross the eight-attempt threshold for individual accounts — not because the rules detected anything structurally different about the attack pattern. T1110.004 did not appear as a distinct sub-technique classification in the dashboard, further reflecting this limitation.

The next lab in this series addresses this gap directly by writing a custom detection rule in local_rules.xml that identifies cross-username failure patterns from a single source IP — extending Wazuh's detection capability beyond the volume-based logic of the default ruleset.