Beat Breakout Time with the Security Data Lake

By Ori Reshef
September 23, 2021
September 23, 2021

Deliver fast time-to-insights and blazing fast queries on exabytes of data directly on the cloud data lake for threat hunting, incident response and security investigations.

Security teams are measured by their ability to collect and analyze as much data as possible in a very short period of time. But with so many types of data flowing at massive volumes, security teams are experiencing:

  • Data explosion of monitored events
  • Constant real-time stream
  • Events and logs grow in complexity and dimensionality (essentially this means that each event has often dozens or even hundreds of attributes)

The result is a limited ability to to triage modern threats. SOC teams are overwhelmed with alerts and data and existing SIEM platforms merely compounds the problem.

Supporting Agile Security Analytics Requires a Fresh Approach

The data lake architecture is gaining momentum in many verticals as it offers a modern and agile alternative to data warehouses and solves storage, access and scale challenges. For threat hunting and anomaly detection, that are heavily relying on the ability to analyze massive amounts of complex data in a very short timeframe, the benefits are dramatic:

  • Raw format:
    Data is easily stored close to its raw, granular form enabling to cross-reference suspicious data across sources
  • Full timeline:
    Data is retained for multiple uses over time, enabling full timeline investigations within a single centralized location
  • Single source of truth:
    Serve all data intelligence automation, that can be used by downstream systems for analytics, ML and AI

Indeed, many vendors in the space are making strategic investments in data lake-based solutions. But with the current inefficiencies of data lake analytics platforms (90% of compute is “wasted” on data scanning and filtering), the move towards the data lake often means organizations will need to compromise on price / performance balance which tends to limit the workloads to experimental and non-production.

Next-generation threat hunting and analysis

Schedule A Demo

Autonomous Indexing Delivers Effective Threat Analysis on the Data Lake

Varada’s security data lake platform runs in the customer’s cloud environment (VPC), enabling SOC analysts, threat detection, anomalies and incident management applications, and essentially any SQL consumer to easily query any data source on the data lake.

Varada leverages the power of autonomous indexing and caching to accelerate queries by 10x-100x. Performance advantages will improve as queries are more complex and selective (needle in a haystack threat analysis), yielding a 40%-60% cost reduction.

Varada’s workload-level observability component enables data teams to seamlessly monitor, optimize and accelerate workloads to meet dynamic business requirements. Data teams can easily set priorities, performance requirements, and budget caps.

Case Study: Varada’s Impact on Endpoint Security

To explain what it means to be “autonomous”, you can break it down to three critical components: adaptive, dynamic and elastic.

1. Be Adaptive

Unlike partitioning-based optimizations, which are limited to several columns, Varada can index any column and automatically decides which data to index and which index to use on each nano-block (small chunk of data, 64K rows, of a single column).

Each nano-block is mapped to the original data set, and includes any of:

  • Metadata unique values, statistics
  • Data columnar encoding and compression
  • Index – bitmap, dictionary, btree, bloom, etc.
  • Lucene – SSD-optimized and compressed
  • Transformation – intermediate data (“virtual column”)

Varada’s indexing suite includes a variety of indexes such as Bitmap, Dictionary, Trees, Bloom Lucene (text searches), etc. Based on the format of the data, structure and cardinality, the platform automatically assigns the most effective index and driving optimal performance.

Effective Text Searches are Critical.

Organizations collect massive amounts of data on various events from many different applications and systems. These events need to be analyzed effectively to enable real-time threat detection, anomalies and incident management. In various security-related use cases, text analytics is leveraged to provide deep insights on traffic and user behavior (segmentation, URL categorization, etc.).

Text analytics has proven to be critical for security information and event monitoring (SIEM) and other SOC tools in reducing the overall time and resources required to investigate a security incident while being as effective and efficient as possible.

Text searches with Apache Lucene are a native part of the platform and are applied automatically by the platform.

2. Be Dynamic. Stop worrying about peaks.

Varada automatically accelerates queries according to workload behavior and automatic detection of hot data and bottlenecks. The platform also enables data teams to define business priorities and accordingly adjust performance and budgets, eliminating the need to build separate silos for each use case.

The platform seamlessly chooses which queries to accelerate and which data to index.

This image has an empty alt attribute; its file name is FastWarmingDiagram3-1024x570.png
Query processing is designed to deliver optimal efficiencies using cached data and indexes, warm data and indexes stored in the data lake, the data lake which remains the single source of truth.

3. Be Elastic

Agility is not limited to the type of queries but also to the volume of queries, which means volatility in compute for query processing is expected to be high. Data teams are often measured on how quickly they can react to spikes in demand.

Varada’s architecture is extremely elastic to enable teams to add more clusters and use cases quickly and dynamically scale out and in, delivering the most effective TCO. Effective separation of compute and storage enables to elastically scale and add additional clusters as query traffic fluctuates, avoiding overprovisioning and idle resources. 

An index-once approach enables to speed up warm-up time by 10x-20x compared to indexing data from scratch — as the platform creates new indexes, they are also stored in a designated folder on the customer’s data lake (“warm data”), in addition to the cluster’s SSDs (“hot data”).

When the cluster is scaled in or eliminated, and some (or all) nodes are shut down, indexes remain available as warm data. Warm indexes enable fast warming up when scaling back out, adding new clusters, and adding SSD resources to cluster(s).

The End Result: Any SQL Query, Any Data, Blazing Fast. Period.

See the magic of Varada’s Security Data Lake on AWS Marketplace or schedule a live demo!

We use cookies to improve your experience. To learn more, please see our Privacy Policy