The data revolution has transformed every facet of modern enterprises, unleashing productivity and insights across each department with one notable exception. In cybersecurity, a generation of analysts finds itself years behind its peers, tethered to an approach that is powerless to tap into the biggest thing in tech since the World Wide Web. Much of the blame for this predicament falls on Splunk—the pervasive platform that inadvertently shaped security operations into a silo of search-driven rather than analysis-driven methodologies. Understanding how we got here can help us rejoin the rest of the enterprise and hopefully not miss out on the promise of data and AI.
Like Google for Your Logs
In 2006, ComputerWorld published an article titled “Splunk Inc.'s Splunk Data Center Search Party” that described the hit new product’s key to success.
Splunk's sweet spot is knowledgeable IT experts who have a good idea of what they are looking for but are having difficulty finding it in the haystack of error logs and application dumps from a myriad of different servers.
Like Google, it automatically indexes everything, but its true power is unleashed when an experienced searcher is looking for something specific.
With Splunk, a network admin troubleshooting a connectivity issue could search the index for firewall error codes associated with dropped packets. Searches would often be successful because the relevant error codes were well-known. IT professionals fell in love with a product that could quickly tell them where to find the needle in the haystack and fix the outage.
The Effect on Security Operations
While Splunk started as an observability solution, security teams also began adopting it. Then-dominant SIEM products like ArcSight had been designed to run on powerful but rigid databases like Oracle. The downside of this approach was that the solutions were cumbersome and required lots of upfront work to prepare the data. Splunk’s flexibility and the fact that it already contained relevant log data at many enterprises led security teams to pull Splunk onto their toolbench.
Splunk responded to this demand by launching Enterprise Security with rule sets and dashboards designed for SOC use cases. Cybersecurity eventually came to account for over half of Splunk’s sales, while the product itself came to define the SIEM category. But the platform remained a “Google for Logs” at heart.
As a result, security operations methodologies became shaped around Splunk’s product capabilities. Many SOC practitioners today view these as equivalent:
Search == Analytics: Splunk's design as a log search engine has led analysts to prioritize search queries over deeper data analytics. This focus on looking up data has constrained the development of analytical competencies among security professionals.
Logs == Data: The emphasis on logs as the primary source of security insights has narrowed analysts' perspectives. Detection logic often fails to see data as a rich, multifaceted resource that extends beyond event logs to user context, asset properties, trends, and baselines.
Enrichment == Contextualization: Index-based search engines deal with enrichment on load much better than they do joining datasets at query time. This has fostered a preference for adding superficial and often outdated properties (enrichment) rather than integrating and understanding security events' broader and up-to-date context (contextualization).
Leaderboards == Metrics: The search platform's orientation towards displaying data in leaderboards (“Top 10 Attacker IP Addresses”) overshadowed meaningful metrics aligned with security operations' strategic goals. In other fields, metrics are often calculated periodically and recorded in dedicated tables- a kind of batch processing for which data warehouses are well suited.
UEBA == Data Science: Dedicated User and Entity Behavior Analytics (UEBA) features introduced basic behavioral analytics to many security operations. However, this has often come at the expense of adopting more comprehensive data science methodologies that could offer deeper insights and predictive capabilities tailored to the organization.
The root of the problem is that cybersecurity is fundamentally not a search problem. Some approaches to threat detection do involve looking for known string values, such as for a user agent string in the crypto miner alert shown below.
This approach, however, is brittle in that an attacker can easily modify their user agent string and bypass the rule. Rules that search for event names associated with an attack, such as bucket enumeration, are prohibitively noisy in an enterprise environment. Cybersecurity is adversarial; anything the defender can easily search for is the first thing the attacker would change. It's very different from troubleshooting a network connection issue!
The ineffectiveness of applied search methodologies was described in David Bianco’s The Pyramid of Pain in 2013. He observed that while the good guys were winning some battles against cybercriminals, “seeing how these indicators were being applied…almost no one is using them effectively.” Security analysts took hashes and IP addresses from threat reports and googled them in their logs. Avoiding these detections was a piece of cake for the bad guys.
The pyramid of pain teaches us that adversaries will dodge the only indicators you can search for and the only indicators that matter you can’t search for.
The problem for our field is that defenders have been trained to go into battle with search engine technology designed to look up log events containing strings. Those strings range from trivial to simple for an attacker to switch and avoid detection- but security analysts have been conditioned to work within these constraints. As such, many of the “next-gen” SIEM solutions, such as Google Chronicle, address some of Splunk’s scale challenges while perpetuating the SOC’s dependence on search. The rule below (from the public Chronicle GitHub account) demonstrates the brittle nature of search-based detection rules still found in the latest generation of SIEM technology.
rule ryuk_ransomware_detector_sysmon_behavior {
meta:
author = "Lee Archinal"
description = "This detects characteristics of the Ryuk Ransomware strain of malware License: https://github.com/Neo23x0/sigma/blob/master/LICENSE.Detection.Rules.md."
reference = "https://tdm.socprime.com/tdm/info/vZQdVgPbH0b7"
version = "0.01"
created = "2019/07/15"
product = "windows"
service = "sysmon"
mitre = "impact, t1486, execution, t1204"
events:
($selection1.metadata.product_event_type = "11" and (re.regex($selection1.target.file.full_path, `.*\.ryk`) or $selection1.target.file.full_path = "RyukReadMe.html"))
condition:
$selection1
}
The SOC Can Catch Up
Contrast the search rule example above with how the rest of the enterprise derives insights from data. Departments that “1337 h4x0rs” might consider less technically proficient, such as sales, finance, and marketing, are years ahead and significantly more successful at achieving their objectives. Without detouring into the questionable incentives of the security operation (quis custodiet ipsos custodes?), we can just look at how an enterprise sales organization answers its data questions, for example, how it predicts which of its customers is most likely to jump ship. Such “churn analysis” can trigger a friendly phone call, special discounts, or other attempts to avoid loss.
In this Snowflake Quickstart example, the customer has ten dimensions that can be used to evaluate churn risk. Some may be associated with activity, such as calling the customer service center, while others wouldn’t appear in the log data.
These inputs are crunched in a data science model that leverages a Random Forest classifier trained on historical data within an analytics platform. The sales team would collaborate with data scientists to develop a data app that can be used in daily sales operations, even by less data-savvy salespeople.
No one would suggest that a sales team try to “search” for churn-risk customers or that finance would “search” for budget inefficiencies. But in security operations, facing the double challenge of finding someone actively trying to avoid detection, search is the status quo. Splunk’s success as a SIEM bears much of the blame.
Meanwhile, a new class of cloud data platforms blends data warehouse, data lake, and search workloads to address the shortcomings of past platforms like Oracle and Hadoop. Snowflake, BigQuery, and others have added native JSON support with schema-on-read, fast search, and streaming analytics. All are delivered as a service. The limitations and overhead that drove the SOC away from Oracle and Hadoop to NoSQL alternatives like Splunk and Elasticsearch have been eliminated.
To ditch its over-dependence on search, an enterprise SOC can start by leveraging the cloud data platform that other departments at the company are already using. In this way, security analysts can access production tooling (e.g., data pipeline and data warehouse), functioning processes (e.g., role management and cost monitors), and people who are experts at the data stuff. Working with the central data science team is a great strategy for security leaders navigating their organization toward analytics best practices.
Also helpful is the growing ecosystem of security products that plug into cloud data platforms with support for the full spectrum of analytics- including search, statistical baselining, and data science. Anvilogic, for example, recently demonstrated how general-purpose data science methodologies can make a big difference in effective threat detection at scale.
A narrow fixation on search has held back security operations. The historical reasons for why this happened are no longer binding, but many SOCs haven’t yet adjusted. To move forward, security leaders can recognize the implications of Splunk's historical influence on security operations and embrace the transformative potential of security data lakes as the path to joining the rest of the enterprise in analytics success.
hmm... I mean... yes.. Splunk CAN be used just to search for simple Hashes... but a well tuned Splunk ES with talented SOC staff can perform complex data and behaviour analytics and ML.