About This Dataset
The Splunk sample dataset contains 800,000 rows of simulated enterprise security telemetry spanning a 90-day period (February–May 2025). It mirrors the types of data your Splunk SIEM would export via CSV for offline analysis, war-gaming, or reporting.
Data Sources Included
- Firewall logs (25% of events) — allow, deny, drop, reject decisions
- Authentication events (20%) — success, failure, lockout, MFA, logout
- Endpoint telemetry (15%) — process creation, file operations, registry modifications
- IDS/IPS alerts (15%) — signature-based detections with threat intel enrichment
- DNS logs (10%) — queries, NXDOMAIN, refused responses
- Web proxy logs (8%) — HTTP traffic with URL, method, and user-agent
- VPN/remote access (5%) — connect, disconnect, auth failures
- Cloud audit logs (2%) — API calls, IAM changes, resource events
Key Fields Reference
| Field | Type | Description |
|---|---|---|
| bytes_in | Integer | Inbound bytes |
| bytes_out | Integer | Outbound bytes |
| duration_sec | Float | Connection duration in seconds |
| username | String | User account associated with event |
| hostname | String | Device name generating the event |
| domain | String | Domain or URL destination |
| country | String | Source country (ISO 2-letter code) |
| session_id | String | Session or connection identifier |
| signature | String | IDS/IPS detection signature (IDS events only) |
| process_name | String | Process name (endpoint events only) |
| file_path | String | File path affected (endpoint events only) |
| registry_key | String | Registry key modified (endpoint events only) |
| mitre_tactic | String | MITRE ATT&CK tactic (high/critical events) |
| mitre_technique | String | MITRE ATT&CK technique ID |
| threat_intel_match | Boolean | Known malicious IP or signature match |
| response_action | String | Action taken on high/critical events |
| rule_id | String | Detection rule that fired |
| event_id | String | Unique event identifier |
| timestamp | DateTime | Event time |
| source_type | String | Log source category |
| severity | String | Risk level of the event |
| action | String | What was done |
| src_ip | IP | Source IP address |
| dst_ip | IP | Destination IP address |
| src_port | Integer | Source port number |
| dst_port | Integer | Destination port number |
| protocol | String | Network protocol |
Complete Dataset Summary
The Complete Dataset Summary prompt was used on this dataset to build the SIEM Analysis workbook that includes data tables and the following charts. It establishes baseline volumes and surfaces the most important metrics at a glance.



Prompt 2 — Event Timeline (Daily Volume)
The Daily Breakdown workbook in this dataset was built using the Daily Event Timeline prompt. The output will produce data tables for the following charts: Understand event distribution over time. Spikes in daily volume often indicate attack campaigns, scanning activity, or misconfigured alerting.



Prompt 3 — Source Type Health Check
The Source Type Health Check prompt was used on this dataset to build the Log Coverage workbook that includes data tables and the following charts. Validate that all log sources are reporting consistently. Gaps or sudden drops in a source type can indicate logging failures, agent outages, or an attacker disabling sensors.







