Defrosting Snowflake SIEM Cost Factors
An unauthorized playbook for estimating security data lake spend
How does Snowflake perform against the 8 SIEM cost factors outlined in last week’s post? For a data lake solution to be worth operationalizing, it must be dramatically more cost-effective and scalable than all-in-one SIEM alternatives. Snowflake has increasingly emphasized its cybersecurity workload, but let's break down the cost factors to get a sense of the potential savings.
Cost Factor: Data Collection
Snowflake supports loading new data using its Snowpipe service. There are several ingest modes available, each with its own mix of latency, cost, and overhead tradeoffs. Snowpipe auto ingest is the most popular option for consistent intake of machine data. It typically makes new data available within one minute, though latency and cost are affected by the shape of the loaded files. Low-volume “trickle” data or lots of small files can be more expensive to load. The typical cost to load an uncompressed terabyte of log data is around 20 credits or $60 at $3 per credit. That’s a conservative credit cost, many customers have negotiated discounts to less than $3 per credit.
It is worth mentioning that Snowpipe Streaming, the newest ingest mode, provides lower latency and cost for many scenarios—especially for environments with security logs already flowing through Kafka (currently a requirement). Not having to stage files before loading also makes the ingest cost more predictable- data lakes getting better at streaming data is an exciting trend to watch.
There’s also an opportunity to further reduce costs by batch-loading certain datasets that aren’t needed for real-time detections. Loading a high-volume, low-use data source three times a day, for example, instead of streaming it continuously can save big bucks.
Let’s ignore those optimizations for our purposes here. We can estimate costs for an organization using Snowpipe auto ingest to collect 7 TB/d uncompressed log data, with each credit costing $3, as follows:
Daily Snowflake Credit Count: 7 TB * 20 credits/TB = 140 credits/day.
Daily Cost in Dollars: 140 credits * $3/credit = $420.
Annual Cost: $420/day * 365 days = $153,300.
Cost Factor: Hot Data Retention
In Snowflake, all data is treated as 'hot', meaning that it's always accessible for analysis. This approach simplifies the data management strategy and cost planning. The cost for hot data retention in Snowflake is usually $23 per compressed terabyte per month. That’s the same cost as plain old S3, with the difference being that the data is automatically compressed, encrypted, and ready to query.
It’s easy to trip up your calculations on this, so remember whether you’re budgeting for data in its original state (uncompressed) or storage (compressed). Snowflake has improved its compression over the years, and I typically see higher than 8x compression these days, but to be conservative we can assume a 5x compression factor.
This cost factor needs to take into account the retention period dictated by your security policy. Snowflake doesn’t impose a retention limit, and I’ve seen security data lakes with over five years of hot data. This works because the storage in the data lake is fully separate from the compute clusters. Data that isn’t in scope for a given query is pruned away by the engine and does not affect performance.
There’s also the option to create tasks that drop records based on logic such as source type, date, environment, etc. This enables a retention policy that is tuned to operational requirements and not just a fixed time period. But to keep things simple here, let’s assume that all collected data will be kept for one year.
Another area where it’s easy to make a calculation error is how the data accumulates. In the first year, you start with no data stored and each month more data adds up. Since we consider retention to be one year, from the second year onwards the new data will replace old data and storage will level out. You can be conservative in your calculations by ignoring the lower storage costs of the first year and estimating based on the steady-state amount. In our example scenario, 7 TB/d retained for one year works out to approximately 500 TB of compressed data in storage.
For our 7 TB/d scenario with 365 days retention, we can estimate annual hot data retention costs as:
Monthly Hot Storage Cost:
Cost per TB: $23
Total Storage: 500 TB
Monthly Cost = 500 TB * $23/TB = $11,500
Annual Hot Storage Cost:
Annual Cost = Monthly Cost * 12 months
Annual Cost = $11,500/month * 12 = $138,000
Cost Factor: Cold Data Retention
Ironically, Snowflake doesn’t have a cold storage tier. That’s a significant departure from traditional SIEM solutions that often require managing multiple tiers. This eliminates the added cost of cold data storage and, more importantly, prevents the complexity and unpredictability of additional charges for data retrieval or rehydration.
Annual Cold Storage Cost: $0
Cost Factor: Detection Processing
SIEM use cases require Snowflake to run automated queries for threat detection. These detection queries vary widely by the security team’s environment, maturity, and risk profile. Many SOCs only have a handful of rules running against their data lake, and some (like IOC matches) can safely be run once or twice a day. But we can be conservative (meaning surprises bring savings instead of overages) by assuming the security team has deployed several hundred rules and runs them continuously.
Snowflake provides compute power through virtual warehouses that have T-shirt sizes from XS to 6XL. Even the XS is pretty beefy, and a virtual warehouse can be resized in a few seconds as needed. Going up a size brings twice the power at twice the cost, measured by the time the warehouse is powered on.
I polled the forward-deployed engineers that support Anvilogic customers with multi-data platform threat detection. The recommendation they gave me for an organization ingesting 7 TB daily to Snowflake was to budget at that level for a Medium-sized warehouse running 24/7. That’s based on experience with Anvilogic customers, and anyone getting started with detection processing in a data lake should run their own tests to confirm the compute power needed for detection rules in their environment.
To estimate the annual cost for detection processing in Snowflake using an 'M' size warehouse, which costs 4 credits per hour, we need to consider the warehouse's operation over the entire year. Here's how the calculation would look:
Hourly Cost of 'M' Size Warehouse:
4 credits per hour.
Cost per Credit:
Assuming a credit cost of $3.
Daily Cost of Running the Warehouse 24/7 for Detection:
Daily Cost = Hourly Cost * 24 hours/day
Daily Cost = 4 credits/hour * $3/credit * 24 hours/day
Daily Cost = $288/day
Annual Cost of Detection Processing:
Annual Cost = Daily Cost * 365 days
Annual Cost = $288/day * 365 days
Annual Cost = $105,120.
Cost Factor: Investigation Processing
One of Snowflake’s key benefits to SIEM use cases is how compute resources can be instantly turned on and off. This eliminates wasteful spending that would otherwise be incurred during inactive periods.
The ability to quickly resize compute power, in the form of virtual warehouse T-shirt sizing, is also a source of savings. I’ve seen security operations teams running at relatively low power levels in their day-to-day, pump up their warehouse by 10x when the proverbial stuff hits the fan. That let those teams meet remediation SLAs in crunch time while avoiding high compute spend during normal operations.
Going by guidance from Anvilogic’s experienced security engineers, we can estimate investigation processing spend for investigation with a Medium sized warehouse. A conservative estimate might be 8 hours a day of actively searching the data lake. We’ll include weekends and holidays to build a buffer and account for busy times and breach investigations (which somehow always start on a Friday afternoon of a holiday weekend).
Daily Cost of Running the Warehouse for 8 Hours for Investigation:
Daily Cost = Hourly Cost * 8 hours/day.
Daily Cost = 4 credits/hour * $3/credit * 8 hours/day.
Daily Cost = $96/day.
Annual Cost of Investigation Processing:
Annual Cost = Daily Cost * 365 days.
Annual Cost = $96/day * 365 days.
Annual Cost = $35,040.
Cost Factor: Archive Processing
Archived data needs to be processed in order to be useful again. As many organizations find out at the most stressful time possible, rehydration from archive to SIEM can be complex and costly. Snowflake eliminates the need for an archive tier as hot data is stored directly in cheap and limitless cloud blob storage. I covered this concept in my post on security data lakes here.
Annual Archive Processing Cost: $0
Cost Factor: Cloud Egress Costs
The big cloud service providers tax data moving out of their network. According to the recent study quoted below, moving 50 TB out of AWS costs around $4,300. That would be the weekly egress cost for our 7 TB/d example organization if they were based in AWS but using SIEM hosted in Azure or GCP.
A significant benefit to using Snowflake for security data is that it is available in all three of the major clouds. So a SOC could potentially use a Snowflake account in AWS for CloudTrail, CrowdStrike, and other sources originating in Amazon services- while using a Snowflake account in Azure for o365, Defender, and other Microsoft services. There would still be a need for correlating threat signals across those environments, and this could be achieved by extracting events of interest without moving the raw data between clouds. Multi-cloud support provides an opportunity for significant savings.
Annual Cloud Egress Cost: $0
Cost Factor: SIEM Solution
Snowflake needs a security layer on top to make it useful for threat detection and response. Some SOCs have built their own rules engine for Snowflake and create threat detection rules in SQL from scratch. There are also well-established SIEMs that can operationalize Snowflake for threat detection and response, including off-the-shelf rules and analytics.
Some of these solutions are charged by data volume, in addition to the Snowflake ingest cost factor. Others have done away with traditional SIEM pricing in favor of feature or asset-based pricing. Ensure that the security layer for your data lake does not become a limiting factor for visibility and effectiveness.
Annual SIEM Solution Cost: Varies by solution.
Conclusion
We’ve now gone through each of the Snowflake cost factors associated with threat detection & response use cases. The Snowflake data platform cost estimate for our example 7 TB/d SIEM with one year of hot retention is approximately $450,000.
Cost factor analysis helps us to understand the ways in which budget constraints may impact our security initiatives with a given platform. We can plot ingest costs at different volume levels to verify that siloed datasets can be brought in to support better detections. We can see whether the team can afford to adopt a full year of hot storage. And we can budget for new use cases like threat hunting, which may require their own cost factor.
Be sure to run your own numbers and validate them in a test environment before committing to any new approach. Hopefully, this walkthrough gave you a better sense of Snowflake’s transformational cost-effectiveness for security use cases and an approach to measuring the potential in your own environment.