The move to the cloud has always been about ease of use and agility. It’s no different when it comes to SIEM vendors that are challenged with offering their customers the flexibility of querying massive sets of data at unprecedented levels and the agility of scaling up and down.
Traditional SIEM solutions were built to analyze terabytes in on-premises infrastructure instances rather than exabytes of logs, events, and constant streaming of new data from a multitude of hybrid cloud environments. But data is not only rapidly growing in volume it’s also exponentially growing in complexity and dimensionality. This is further complicated by the overabundance of monitoring platforms and devices that are involved.
In the modern SOC of today, it is critical to have rapid access to as much data as possible. Centralized Log Management (CLM) solutions and approaches with threat intelligence and event data can cover multiple years of retention and keep data at its granular level to enable effective threat detection. Given the high velocity and variability, analyzing such enormous amounts of data requires a new approach and mentality.
To ensure their strategic advantage, many vendors and MSSPs rush into implementing an approach that doesn’t properly account for these variables and often result in a multitude of impactful issues, along with high direct and indirect costs.
The common pitfalls of delays in embracing a modern architectural approach:
Cyber vendors and MSSPs are on the hunt for a truly smart data architecture that will enable them to mitigate the modern challenges of handling massive amounts of data and identify new threats quickly. If designed correctly, a data lake architecture can tick all these boxes.
A data lake is considered the single source of truth (all data, granular or raw form). In terms of flexibility, it is a major evolution. The security data lake is more flexible and supports unstructured and semi-structured data in its native format, and can include log files, tables, feeds, system logs, text files, and more.
This means that any new data can be analyzed immediately with zero time-to-insights, resulting in very fast results for hunting and threat intelligence without losing the full dimensionality of the data. When compute and storage resources are tightly coupled, organizations tend to limit scaling and expansion to avoid the hefty price tag. By moving to a security data lake strategy, organizations can avoid the scaling dilemma.
With the strong shift towards the cloud, there will be a convergence of the data warehouse with a security data lake.
With this modern approach and focus on security data lakes, best of breed solutions all have the ability to:
In previous approaches, to ensure performance, many vendors compromised on accessing all their available data and settled for isolated data silos that had been prepared and modeled to enable speedy analytics. The data lake is a cost-effective and simple storage layer that can serve as your modern replacement for existing data architectures, enabling cutting-edge threat detection and analytics.
The data lake architecture still has some drawbacks that must be effectively addressed. When it comes to data lake query engines, over 90% of compute is wasted on scanning data, which causes unnecessary high cost, delays in accessing critical data needed to meet business requirements, and the over-consumption resources that could be utilized elsewhere.
Varada enables security-driven analytics workloads to access raw behavior data on the data lake, connect disparate ‘dots’ to detect multi-abnormalities and compare real-time activity to patterns in the data lake to help rule out false positives and quickly identify legitimate threats.
Varada’s solution can be easily integrated in vendors’ technology stack and with any data lake architecture, enabling threat detection, anomalies and incident management applications, and essentially any SQL-based analytics on any data source on the data lake.
To address the challenges of existing data lake query engines, Varada leverages the power of autonomous indexing and caching to reduce the data scanned by analytics queries by at least an order of magnitude, which means compute resources are utilized in a highly effective manner. To eliminate highly expensive and optimized text-based analytics solutions, Varada natively uses text analytics (Apache Lucene) to conduct full-text searches directly on the data lake and across the entire dataset.
The results are mind blowing! Queries are expected to run 10x-100x faster, and performance advantages will improve as queries are more complex and selective (needle in a haystack threat analysis) — at a 40%-60% cost reduction.
The end result of this investment is an easy-to-use and cost-effective security data lake that has visibility, speed, scale, and flexibility.
About the author
Brad LaPorte is a former top-rated Gartner Research Analyst for cybersecurity and held senior positions in US Cyber Intelligence, Dell, and IBM, as well as at several startups.
Brad has spent most of his career on the frontlines fighting cybercriminals and advising top CEOs, CISOs, CIOs, CxOs as well as other thought leaders on how to be as efficient and effective as possible. He is currently a Partner at High Tide Advisors and actively helping cybersecurity and tech companies grow their go-to-market strategies.