Data Virtualization for Text Analytics on the Data Lake

Seamlessly support text analytics workloads, including latency sensitive use cases, directly on your data lake and alongside any other ANSI SQL workload.
Challenge

Big Data Text Analytics is Gaining Momentum But Team are Challenged with Supporting Agility and Performance

Text analytics is all about delivering high-performance and powerful text-optimized search and index capabilities to support a wide range of use cases. From an observability perspective, text analytics is a very wide domain, ranging from sentiment analysis to log analysis and everything in between. It enables organizations to leverage the power of data democratization and easily access and analyze text-based content (blogs, chats logs, metric, APM, etc.).
Logs analytics provides observability on the application level, enabling to analyze events from applications in a timely manner to identify performance and availability issues.
Observability also plays an important role in infrastructure monitoring by delivering a centralized analysis of logs and metrics from many infrastructure components such as servers, VMs, network components, etc. This deep visibility into infrastructure performance enables to detect bottlenecks and performance issues.
Furthermore, text analytics has proven to be critical for security information and event monitoring (SIEM). Organizations collect massive amounts of data on various events from many different applications and systems. These events need to be analyzed effectively to enable real-time threat detection, anomalies and incident management.
In various marketing-related use cases, text analytics is leveraged to provide deep insights on traffic and user behavior (segmentation, URL categorization, etc.).

In the last couple of years text analytics has grown exponentially in volume, challenging data teams in optimizing cost and performance amid the massive amounts of data. Large scale text analytics requires customized optimizations for LIKE %text% function and RegExps, which often results in turning to disparate data silos that specialize in text. To avoid moving data to yet another data marts, data teams are now turning to data virtualization tools that can meet business needs of massive text analytics at a reasonable cost and with very short time-to-market.

When evaluating data virtualization vs data warehouse alternatives, existing big data virtualization solutions offer the ability to access text and logs data from any source directly on top of your data lake. SQL engines provide a set of text analytics capabilities, but rarely at the desired efficiency.
Furthermore, serving low latency text analytics workloads, alongside other ANSI SQL queries, becomes a substantial challenge and often requires setting up separate and highly optimized clusters to support interactive text analytics. By avoiding a separate data silo, data teams can deliver a unified solution for queries that can meet their business needs, regardless of whether it’s a text-driven workload or any other ANSI SQL workload.

Solution

Varada’s Dynamic & Adaptive Big Data Indexing Offers One-Stop-Shop for Any Workload, Including Text & Logs Analytics

Varada’s cloud data virtualization technology seamlessly supports any ANSI SQL analytics directly on the data lake, without the need to move data or build a separate and optimized stack.
Varada’s adaptive indexing technology automatically detects which index will optimally serve queries. Text analytics queries are significantly accelerated by using integrated Apache Lucene indexing, enabling analysts, data scientists and data applications to leverage blazing fast text filters without any need for SQL performance tuning or SQL optimizations. Data teams can also easily integrate text search into Business Intelligence systems and dashboards.

Interactive Text Analytics Directly on Your Data Lake

Integrated Apache Lucene

Extending the power of data virtualization with a high-performance, full-featured text search engine.

Fast Text Analytics

Deliver x100 faster query response time across any data source, using the LIKE operator (text contains) or regexp_like which allows filtering by regular expressions for more complex searches. Data team can also significantly accelerate existing queries with LIKE/ regexp_like operators.

Any Data, Any Scale

Varada seamlessly connects directly to a wide range of data sources, including the data lake (AWS S3, on-prem Hadoop, etc.), data catalogs (Hive Metastore, AWS Glue) and other sources
(MySQL, PostgreSQL, etc.).

Set Workload Priorities for Performance and Cost

Varada automatically and dynamically chooses which queries to accelerate, based on continuous monitoring and priorities set by data teams.

Future-Ready and Optimized for Any Question

Varada adaptively and dynamically indexes any column on trillions of rows, supporting any ANSI SQL.

Your VPC

Varada runs in your cloud environment, keeping data in your full control and in your own VPC, so you can employ existing security policies.

Varada’s Data Virtualization Platform

Explore Our Platform
We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept