It’s Time to Get Wet: Rethinking SIEM Platforms

By Roman Vainbrand
September 9, 2021
September 9, 2021

As organizations look to become more agile from the start there has been a mass movement jumping headfirst into what is called a security data lake.

Gartner defines data lakes as “a concept consisting of a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format and are in addition to the originating data stores.” 

Expanding this concept to include security-specific data, “security data lakes” can help you centralize and store unlimited amounts of data to power investigations, analytics, threat detection, and compliance initiatives. Analysts and applications can access logs from a single source to perform data-driven investigations at optimal speed with centralized, easily searchable data. 

The global data lake market size was valued at over $8 billion USD last year and is expected to grow at a compound annual growth rate (CAGR) of over 21% from 2021 to 2028. According to Gartner, over half of organizations plan to implement a data lake in the next 24 months. An enormous amount of information is generated daily on digital information platforms and requires efficient processing and indexing architectures.

Security Data Lakes: Hunting for the Right Data

Security data lakes are designed to centralize all of your data so you can support complex use cases for security analysis, including threat hunting and anomaly detection at scale. One of the top challenges is long-term retention and having the ability to search across collected telemetry. Most vendors have a data retention cap between 7 and 30 days and often offset costs to the buyer whether they know it or not. For example, according to Gartner and multiple cloud benchmark studies over the years, on average, it costs $6 USD per endpoint per year for 7 days of endpoint detection and response (EDR) continuous recorded data, which is why EDR solutions are so expensive. 

Accessing all of your historical data is critical to being able to have the right contextual information to conduct an effective and efficient security investigation.

As seen with the SolarWinds supply chain attack, it was months before the security community was made aware of the malicious artifacts and adversarial tactics, techniques, and procedures (TTPs) and the motivations and scope behind such a complex type of attack. This meant that many organizations could not perform the historical hunting across the relevant time window because those logs had already aged out of the platform or had been moved into offline archives making it difficult to triage the scope of the attack.

Schedule A Demo

Security Data Lake Success Criteria

There are 4 key data-related challenges that security teams must have in place for a security data lake architecture to operate efficiently and effectively.  

  1. Access to ALL key data (any type, volume, timeframe, format)
    Security applications, analysts and responders need access to every piece of information they can get their hands on to make proper security investigations with the highest levels of fidelity. 
  2. Instant access (zero time to insights)
    Security investigations need to operate at the speed of now with zero delays in system responsiveness. 
  3. Scalability
    The approach needs to be able to elastically and effectively scale out and in as needed for a dynamically expanding digital ecosystem and volatile demand. 
  4. Price-performance balance  
    This additional functionality needs to reduce costs and not contribute to them to remove barriers to implementation and long-term operational and financial benefits. 

Benefits to Starting with Security Data Lake

Organizations are taking extra care in implementing a best-of-breed approach that not only addresses immediate needs but also does for the long run. 

  • Efficient resource utilization
  • Consistent performance
  • Access to all operational data sets of historical data
  • Predictable cost structure
  • Ability to access critical business operational data at machine speed
  • Full control of the data format (original raw form vs being modified and or truncated) 
  • Security control and compliance tradeoffs are sacrificed to favor of basic functionality 
  • Do-It-Yourself is not sustainable and is very costly in the long term 

The main pitfall for data lake architectures, especially when evaluated against existing SIEM solutions and other optimized platforms is resource efficiencies. Data lake query engines are often based on brute force technology that essentially scans the entire data set. The result is that 80%-90% of compute resources are ”wasted” on ScanFilter operations.

Organizations that have already attempted to leverage data lake architectures often find themselves managing huge clusters to ensure performance and concurrency requirements are met. This is extremely expensive on both resources and maintaining large data teams.

There are different ways to tackle these challenges, ranging from partitioning-based optimizations, using managed platforms or even serverless solutions such as AWS Athena.

The Benefits of a Better Approach: The Power of Big Data Indexing

Unlike partitioning-based data lake optimizations, which are limited to several dimensions, Varada offers a data lake analytics solution that is based on proprietary big data indexing technology.

Case Study: Varada’s Impact on Endpoint Security

Varada can index any column and automatically decides which data to index and which index to use on each nano-block (small chunk of data, 64K rows, of a single column). Varada’s indexing suite includes a variety of indexes such as Bitmap, Dictionary, Trees, Bloom Lucene (text searches), etc. Based on the format of the data, structure and cardinality, the platform automatically assigns the most effective index and, driving optimal performance. 

The platform seamlessly chooses which queries to accelerate and which data to index according to workload behavior and automatic detection of hot data and bottlenecks. The platform also enables data teams to define business priorities and accordingly adjust performance and budgets, eliminating the need to build separate silos for each use case.

Agility is not limited to the type of queries but also to the volume of queries, which means volatility in compute for query processing is expected to be high. Data teams are often measured on how quickly they can react to spikes in demand. Varada’s architecture is extremely elastic to enable teams to add more clusters and use cases quickly and dynamically scale out and in, delivering the most effective TCO. Effective separation of compute and storage enables to elastically scale and add additional clusters as query traffic fluctuates, avoiding overprovisioning and idle resources.

Text Analytics is a Killer Feature

As a part of Varada’s indexing suite, text searches with Apache Lucene are a native part of the platform and are applied automatically by the platform.

Organizations collect massive amounts of data on various events from many different applications and systems. These events need to be analyzed effectively to enable real-time threat detection, anomalies and incident management. In various security-related use cases, text analytics is leveraged to provide deep insights on traffic and user behavior (segmentation, URL categorization, etc.). Text analytics has proven to be critical for security information and event monitoring (SIEM) and other SOC tools in reducing the overall time and resources required to investigate a security incident while being as effective and efficient as possible.

See the magic on AWS Marketplace or schedule a live demo!

About the author
Brad LaPorte is a former top-rated Gartner Research Analyst for cybersecurity and held senior positions in US Cyber Intelligence, Dell, and IBM, as well as at several startups.

Brad has spent most of his career on the frontlines fighting cybercriminals and advising top CEOs, CISOs, CIOs, CxOs as well as other thought leaders on how to be as efficient and effective as possible. He is currently a Partner at High Tide Advisors and actively helping cybersecurity and tech companies grow their go-to-market strategies.

We use cookies to improve your experience. To learn more, please see our Privacy Policy