Modern SIEM solutions have been dramatically impacted by the rapid move to the cloud. However many organizations are still challenged with adapting to this dynamically changing environment. Effectively, the demands of data management had shifted from a Formula 1 mindset focused on speed and performance to a Rally car racing that requires agility at maximum speed.
SIEM solutions were originally built to triage a limited data set, primarily stored on-premises, enabling teams to focus on optimizing performance against costs on siloed platforms. But with exabytes of logs, events, and constant streaming of new data from a multitude of hybrid cloud environments, the challenge has shifted. It’s not just the volume of data that has been growing rapidly, it’s the dimensionality of the data that adds a new layer of complexity.
This multi-dimensionality of modern data sets has made it very difficult to optimize for price / performance. The reality is that over 90% of compute is wasted on scanning data in their environment, which causes unnecessary high cost, delays in accessing critical data needed to meet business requirements, and the over-consumption resources that could be utilized elsewhere.
In this post we’ll review three different approaches will help frame the architecture you need.
Is it a Formula 1 or a rally car?
SIEM technology supports threat detection, compliance, and security incident management by collecting and analyzing security events and a wide variety of other contextual data sources. The core capabilities are a broad scope of log event collection and management, the ability to analyze log events and other data across disparate sources, and operational capabilities such as incident management and response, dashboards and reporting.
Despite its long-standing history, going back to 2005, the traditional SIEM approach has outlived its potential. The simple, straightforward collection of security events is no longer sufficient in today’s security operating environment.
Due to excessively high license and infrastructure costs, teams purposefully do not collect all of the security data they need to defend against attacks. This is a considerable challenge that causes key events to be missed during an investigation or, worse, for breaches to go unnoticed.
As the volume of processed data continues to grow, it becomes more difficult to filter and detect malicious activity. Traditional SIEMs are limited in providing advanced analytics capabilities or relying on restrictive languages to query and interact with the data.
Bottom line: deliver effective and controllable platform for predefined data and analysis.
Though data warehouses and optimized data platforms were designed to deliver a strong price versus performance balance, security-driven data can be dynamic, dimensional, and disparate, making data warehouse solutions less effective in delivering the agility and performance users need.
Organizations use data warehouses as a central repository. The warehouse is typically connected to multiple data streams, such as relational databases, transactional systems, and other sources. The data is then kept in the warehouse for future use, but it can also be used for analysis purposes.
Several organizational roles work directly with the data warehouse. Data engineers ensure the data is ingested and processed correctly. SOC analysts and security incident responders access the data via integrations and SQL clients to extract relevant security insights and build reports and dashboards for decision-makers to act on.
Bottom line: compromise on price / performance to deliver better flexibility and agility.
The data lake delivers a truly agile approach and architecture to smart threat analysis and detection.
The data lake was initially designed as the single source of truth of all data, granular and in raw form. But when it comes to running analytics on these vast data sets, it is a major evolution that delivers unprecedented flexibility and agility. A well-designed data lake analytics stack eliminates the need to model data or move it to optimized platforms, while supporting any query whenever it’s needed.
When used on security data, the data lake architecture offers a strategic advantage to cyber vendors, MSSPs and even organizations that run their own data lake. It’s far more flexible and supports unstructured and semi-structured data in its native format, and can include log files, tables, feeds, system logs, text files, and more. A data lake architecture is based on the promise that any new data can be analyzed immediately with zero time-to-insights, resulting in very fast results for hunting and threat intelligence without losing the full dimensionality of the data.
Here comes the big “but”… data lake query engines are often based on brute force technology, which means that full scans will be required to process queries. Partitioning can help in cases that filtering is limited to a small number of columns, but when it comes to highly dimensional data partitioning will not make a huge impact. The main impact of full scans is the cost of compute resources. Though storage is relatively inexpensive, compute clusters can quickly turn into budget busters.
Bottom line: support agile and non pre-defined data and analysis, but be prepared to compromise on price / performance balance and rising DataOps.
This illustration highlights the advantage of each approach, as well as the compromises, so you can identify the right solution for each workload. Essentially it’s all about agility vs. price / performance. As agility becomes more predominant in achieving a strategic advantage, modern solutions tend to focus on delivering very fast time-to-market and optimal flexibility.
The new vendors, including LogRhythm, Exabeam, Securonix, Panther Labs and of course Snowflake, have all made significant investments in delivering agility and flexibility. You can see the cloud vendors also taking a step in this direction, especially as they seek to leverage their data lake solutions and augment their native stack of solutions to address the security data lake.
When evaluating or designing your security data lake, here are a few elements to consider:
Varada tackles the challenges of the security data lake with a unique indexing technology that dynamically and automatically accelerates queries according to actual demand. The platform is designed to decouple compute and storage and can effectively scale to make sure you put your budget to good use.
Varada’s approach essentially breaks the compromise between agility and efficiencies. With 10x-100x faster queries than other data lake query engines and a very compute-light solution, you can now serve performance and budget-sensitive threat analysis and detection workloads directly on top of your cloud data lake and in your own environment (VPC). So now your race car can take on all those unpredictable spins and still win the race!
About the author
Brad LaPorte is a former top-rated Gartner Research Analyst for cybersecurity and held senior positions in US Cyber Intelligence, Dell, and IBM, as well as at several startups.
Brad has spent most of his career on the frontlines fighting cybercriminals and advising top CEOs, CISOs, CIOs, CxOs as well as other thought leaders on how to be as efficient and effective as possible. He is currently a Partner at High Tide Advisors and actively helping cybersecurity and tech companies grow their go-to-market strategies.