Apache Tez

By |2022-07-04T13:45:33+00:00February 1st, 2018|Apache Tez, Informative, Technologies|

Introduction The Apache TEZ® project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. The 2 main design [...]

Apache Drill

By |2022-07-04T14:43:17+00:00February 1st, 2018|Apache Drill, Informative, Technologies|

Apache Drill: Drill is an Apache open-source SQL query engine for Big Data exploration. Apache Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data [...]

Presto

By |2022-07-04T14:38:41+00:00February 1st, 2018|Informative, Presto, Technologies|

WHAT IS PRESTO? Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from [...]

Apache Hive 

By |2022-07-04T13:41:40+00:00January 31st, 2018|Apache Hive, Informative, Technologies|

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following features: Tools to enable easy [...]

Apache Flink

By |2022-07-04T14:01:44+00:00January 31st, 2018|Apache Flink, Informative, Technologies|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The [...]

Apache Kafka

By |2022-07-04T14:18:59+00:00January 30th, 2018|Apache Kafka, Informative, Technologies|

We think of a streaming platform as having three key capabilities: It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging [...]

Apache Pig

By |2022-07-04T13:49:15+00:00January 30th, 2018|Apache Pig, Informative, Technologies|

Apache Pig is a high-level language platform developed to execute queries on huge datasets that are stored in HDFS using Apache Hadoop. It is similar to SQL query language but applied [...]

Apache Hadoop

By |2022-07-04T14:34:41+00:00January 29th, 2018|Apache Hadoop, Informative, Technologies|

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers [...]

CONTACT US