Apache Hbase

By |2024-07-31T10:17:16+00:00February 1st, 2018|Apache Hbase, Informative, Technologies|

What is Apache HBase? Apache Hbase is a popular and highly efficient Column-oriented NoSQL database built on top of Hadoop Distributed File System that allows performing read/write operations on large datasets in real time [...]

Apache Oozie 

By |2022-07-04T13:34:33+00:00February 1st, 2018|Apache Oozie, Informative, Technologies|

OVERVIEW The blueprint for Enterprise Hadoop includes Apache™ Hadoop’s original data storage and data processing layers and also adds components for services that enterprises must have in a modern data architecture: data integration and [...]

Apache Kudu

By |2022-07-04T14:35:39+00:00February 1st, 2018|ApacheKudu, Informative, Technologies|

Introducing Apache Kudu Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and [...]

Apache Tez

By |2024-07-31T13:22:34+00:00February 1st, 2018|Apache Tez, Informative, Technologies|

Apache Tez Introduction The Apache TEZ® project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. The 2 [...]

Apache Drill

By |2022-07-04T14:43:17+00:00February 1st, 2018|Apache Drill, Informative, Technologies|

Apache Drill: Drill is an Apache open-source SQL query engine for Big Data exploration. Apache Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data [...]

Apache Storm 

By |2024-07-31T13:21:13+00:00February 1st, 2018|Apache Storm, Informative, Technologies|

Apache Storm OVERVIEW A system for processing streaming data in real time Apache™ Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine [...]

Spark SQL 

By |2022-07-04T14:47:26+00:00January 31st, 2018|Informative, SparkSQL, Technologies|

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]

Apache Hive 

By |2024-07-31T13:08:10+00:00January 31st, 2018|Apache Hive, Informative, Technologies|

What does Apache Hive do? The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following [...]

Apache Mahout 

By |2024-07-31T13:12:25+00:00January 31st, 2018|Apache Mahout, Informative, Technologies|

What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop®  and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]

CONTACT US