Spark SQL 

By |2022-07-04T14:47:26+00:00January 31st, 2018|Informative, SparkSQL, Technologies|

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]

Apache Hive 

By |2024-07-31T13:08:10+00:00January 31st, 2018|Apache Hive, Informative, Technologies|

What does Apache Hive do? The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following [...]

Apache Mahout 

By |2024-07-31T13:12:25+00:00January 31st, 2018|Apache Mahout, Informative, Technologies|

What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop®  and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]

Apache Flink

By |2024-07-31T13:00:33+00:00January 31st, 2018|Apache Flink, Informative, Technologies|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The “What”: Flink from [...]

Apache ZooKeeper

By |2022-07-04T14:33:08+00:00January 31st, 2018|Apache ZooKeeper, Informative, Technologies|

Apache ZooKeeper Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration [...]

Apache Flume

By |2024-07-31T13:02:23+00:00January 30th, 2018|Apache Flume, Informative, Technologies|

What is Apache Flume? Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The [...]

Apache Spark 

By |2024-07-31T13:18:23+00:00January 30th, 2018|Apache Spark, Informative, Technologies|

What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark [...]

CONTACT US