The holiday season in 2013 was busy for Target, and for its security operations team in Minneapolis. It was so busy that an alert from the FireEye SIEM went unnoticed. Within weeks, tens of millions of shoppers would have their cards skimmed. The company didn’t have a CISO at the time but the CEO and CIO both resigned following the breach. A full decade later, many SOCs are still grappling with way more alerts than they can handle. That can change with an “alert budget” detection engineering initiative.
What it looks like to have too many alerts
Alert overload is so common that managers often accept a level of risk that’s far beyond what their leadership has in mind. This disconnect can be frustrating for CISOs when they receive a post-incident readout and ask how the adversary slipped by when, in retrospect, the signs were all there. Save your CISO the heartburn by flagging any of these symptoms in your environment:
'Eyeballing' Thousands of Alerts: If your alert triage strategy involves scrolling, that’s all you need to know. Consistency and reliability are bound to suffer, and analysts will feel like they’re drowning. Metrics and continuous improvement will be hard to achieve.
Only Looking at Critical Alerts: Alert severity was intended to be a prioritization tool. You would review medium severity alerts before looking at the lows, and a High severity alert might justify a 3am wake-up call to the SOC manager. The vendors that write the alerts are aware of this dynamic and prefer to minimize burden that might antagonize customers. As a result, I’ve seen some serious detection rules categorized by vendors as Medium or Low severity that should have been High or Critical. This “severity deflation” turns into risk when you disregard the resulting alerts.
Trusting the MSSP to Figure It Out: While many organizations can benefit from working with a managed services provider, security operations teams should know that there’s no magic pixie dust on the other side. An environment that generates thousands of alerts a day is bound to be a shapeless blur for a third-party vendor juggling hundreds of customers who are just as noisy.
On the other hand, a security operation with a healthy number of alerts is likely to have the following happy conditions:
Each alert has a unique ID, assignee and resolution.
False positive alerts feed into a continuous improvement process based on an understanding of why they happened and how often the detection rule is misfiring.
Some alerts are generated and investigated every day, and each week the team has at least one interesting thing that got flagged.
Creating an Alert Budget
If your alert queue looks like a Best Buy on Black Friday, make a New Years resolution to create an alert budget. In my experience, the best approach is to set aggressive targets from the start. So even if your daily volume is measured in the thousands today, you can aim for a budget that’s 50 or even 20 alerts a day.
You may be wondering if your targeted alert volume should be based on the number of analysts. I recommend sticking with an absolute number, regardless of the size of the team. One reason is Parkinson’s Law. More importantly, setting an alert budget should push your team to fully embrace detection engineering. So a larger organization can be expected to compensate for more surface area with better engineered detections- not more bodies thrown at sorting through alerts.
The target daily volume is your alert budget. The gap between that number and how many alerts you’re actually experiencing is a juicy, fun engineering problem to solve. Some teams have found the need to declare “alert bankruptcy” in order to free up the time needed for improving their detections. Treating the work towards resuming standard operations as a project (with a committed deadline and everything) is a sign of engineering maturity.
When you have established your alert budget, consider tracking some related metrics to keep you on track. Helpful KPIs for alert budget success include:
30d Budget Utilization: Alert Volume (daily average over last 30 days) vs Budget (fixed daily amount)
QoQ Budget Performance: Change in the budget utilization rate this quarter vs last quarter
Alert Closure Rate: Percentage of alerts resolved within a specified timeframe
Watch for utilization that’s too high (over 100% means the budget is being exceeded) or too low. Overcompensating to where detections are disabled across the board is just as risky as uninvestigated alerts. With a quantified budget and a handful of related metrics, leadership gains assurance that the SOC is succeeding and team members gain a feeling that success is possible–they’re hitting targets instead of treading water.
Getting and Staying on Budget
Circling a budget number on a whiteboard is the easy part. Achieving your alert budget goal takes strategy, team-building and detection engineering effort. Start by grouping together the alerts that have been generated over the recent days and weeks, and creating a “top offenders” list of the detections that made the most noise. If your starting point is way over budget, the alerts your team is receiving are probably obvious false positives. How does the team know to ignore those? It likely has to do with the asset or identity context (e.g. this system is in a lab environment, this user is an IT admin, etc.) so pull that information into the detection. This is where a security data lake can provide the context needed to cut down alert noise.
Your alert budget can also play a role in your actual budget. If the alerts that are driving your overage are coming from sources that aren’t in the SIEM, you may need better detection tooling. If you need to shrink daily alert volume while supporting additional alerts for new surface area, you may need additional headcount. Security can be less of an art and more of a science if you’re able to present the relationship between alert budget and team size.
There’s also an opportunity to leverage machine learning, for example to recommend alerts based on threat models and progress in reducing budget utilization. Create an iterative process where the most noisy detections are refined and new detections are built responsibly, with an eye to “cost” in terms of alert volume. You’ll be able to improve your predictive capabilities over time as the team gets experience with sticking to an alert budget and gets in the habit of testing new rules against historical data before pushing detections to production.
Conclusion
The benefits of an alert budget go beyond just the mental health of your analysts. Consistency and reliability in triage are key to successful detection and response over time. Reflecting a realistic bandwidth makes it clear to managers outside the security organization that there’s a limit to how quickly the SOC can scale–unchecked growth in surface area or product features that need monitoring leads to risk that the security team shouldn’t be on the hook to accept. And replacing the analyst hamster-wheel with a culture of engineering helps to attract and retain motivated practitioners. Adopting an alert budget strategy with continuous improvement is sure to bring your security operations team a dose of much-needed holiday cheer.