Apache Hbase
What is Apache HBase? Apache Hbase is a popular and highly efficient Column-oriented NoSQL database built on top of Hadoop Distributed File System that allows performing read/write operations on large datasets in real time [...]
What is Apache HBase? Apache Hbase is a popular and highly efficient Column-oriented NoSQL database built on top of Hadoop Distributed File System that allows performing read/write operations on large datasets in real time [...]
MapReduce Tutorial: Introduction In this MapReduce Tutorial blog, I am going to introduce you to MapReduce, which is one of the core building blocks of processing in Hadoop framework. Before moving ahead, I would [...]
OVERVIEW The blueprint for Enterprise Hadoop includes Apache™ Hadoop’s original data storage and data processing layers and also adds components for services that enterprises must have in a modern data architecture: data integration and [...]
Introducing Apache Kudu Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and [...]
Apache Tez Introduction The Apache TEZ® project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. The 2 [...]
Apache Drill: Drill is an Apache open-source SQL query engine for Big Data exploration. Apache Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data [...]
Apache Storm OVERVIEW A system for processing streaming data in real time Apache™ Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine [...]
Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]
What does Apache Hive do? The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following [...]
What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]