Apache Drill

By Dj Das|2026-03-31T08:30:18-06:00February 1st, 2018|Technologies|

Apache Drill: Drill is an Apache open-source SQL query engine for Big Data exploration. Apache Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data [...]

Presto

By Dj Das|2026-03-31T08:32:41-06:00February 1st, 2018|Technologies|

WHAT IS PRESTO? Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from [...]

Apache Sqoop

By Dj Das|2026-02-18T08:16:42-06:00February 1st, 2018|Technologies|

Before starting with this Apache Sqoop tutorial, let us take a step back. Can you recall the importance of data ingestion, as we discussed it in our earlier blog on Apache Flume. Now, as we [...]

Apache Storm

By Dj Das|2026-03-31T08:35:05-06:00February 1st, 2018|Technologies|

Apache Storm OVERVIEW A system for processing streaming data in real time Apache™ Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine [...]

Spark Streaming

By Dj Das|2026-03-31T09:41:01-06:00February 1st, 2018|Technologies|

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark Streaming can be used to stream live data and processing can happen [...]

Apache Cassandra

By Dj Das|2026-03-31T09:44:05-06:00February 1st, 2018|Technologies|

What is Apache Cassandra™? Apache Cassandra™, a top level Apache project born at Facebook and built on Amazon’s Dynamo and Google’s BigTable, is a distributed database for managing large amounts of structured data across many commodity servers, while [...]

Spark SQL

By Dj Das|2026-02-18T07:15:04-06:00January 31st, 2018|Technologies|

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]

Apache Hive

By Dj Das|2026-03-31T09:46:48-06:00January 31st, 2018|Technologies|

What does Apache Hive do? The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following [...]

Apache Mahout

By Dj Das|2026-03-31T09:49:01-06:00January 31st, 2018|Technologies|

What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]

Apache Flink

By Dj Das|2026-03-12T07:46:04-06:00January 31st, 2018|Technologies|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The “What”: Flink from [...]

About Dj Das

Bring Your Data or AI Vision. Let's Build It Together.

Who We Are

Enterprise AI Services

Foundational Data & AI Services

ThirdEye Data Exclusives

Assets & Resources

Hands-on AI Engineering Expertise

Head Office

Company Insights

Products & Platforms

Offshore Office

20+ Pre-built AI Solutions

Delivery Centers

Who We Are

Enterprise AI Services

Foundational Data & AI Services

ThirdEye Data Exclusives

Products & Platforms

Assets & Resources

Hands-on AI Engineering Expertise

Company Insights

20+ Pre-built AI Solutions

Head Office

Delivery Centers