Spark SQL 

By |2022-07-04T14:47:26+00:00January 31st, 2018|Informative, SparkSQL, Technologies|

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]

Apache Hive 

By |2022-07-04T13:41:40+00:00January 31st, 2018|Apache Hive, Informative, Technologies|

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following features: Tools to enable easy [...]

Apache Mahout 

By |2022-07-04T13:37:34+00:00January 31st, 2018|Apache Mahout, Informative, Technologies|

Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop®  and using the MapReduce paradigm. Machine learning is a discipline of artificial intelligence focused on enabling [...]

Apache Flink

By |2022-07-04T14:01:44+00:00January 31st, 2018|Apache Flink, Informative, Technologies|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The [...]

Apache ZooKeeper

By |2022-07-04T14:33:08+00:00January 31st, 2018|Apache ZooKeeper, Informative, Technologies|

Apache ZooKeeper Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration [...]

Apache Kafka

By |2022-07-04T14:18:59+00:00January 30th, 2018|Apache Kafka, Informative, Technologies|

We think of a streaming platform as having three key capabilities: It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging [...]

Apache Flume

By |2022-07-04T13:58:15+00:00January 30th, 2018|Apache Flume, Informative, Technologies|

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The use of Apache Flume [...]

Apache Pig

By |2022-07-04T13:49:15+00:00January 30th, 2018|Apache Pig, Informative, Technologies|

Apache Pig is a high-level language platform developed to execute queries on huge datasets that are stored in HDFS using Apache Hadoop. It is similar to SQL query language but applied [...]