ThirdEye Data Logo - For Official Use

Spark Streaming 

By |2022-07-04T13:47:11+00:00February 1st, 2018|Uncategorized|

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark Streaming can be used to stream live data and processing can happen [...]

Apache Cassandra

By |2024-07-31T12:56:48+00:00February 1st, 2018|Uncategorized|

What is Apache Cassandra™? Apache Cassandra™, a top level Apache project born at Facebook and built on Amazon’s Dynamo and Google’s BigTable, is a distributed database for managing large amounts of structured data across many commodity servers, while [...]

Apache Hive 

By |2024-07-31T13:08:10+00:00January 31st, 2018|Uncategorized|

What does Apache Hive do? The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following [...]

Apache Mahout 

By |2024-07-31T13:12:25+00:00January 31st, 2018|Uncategorized|

What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop®  and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]

Apache Flink

By |2024-07-31T13:00:33+00:00January 31st, 2018|Uncategorized|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The “What”: Flink from [...]

Apache ZooKeeper

By |2022-07-04T14:33:08+00:00January 31st, 2018|Uncategorized|

Apache ZooKeeper Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration [...]

Cloudera Impala 

By |2022-07-04T14:21:52+00:00January 31st, 2018|Uncategorized|

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL [...]

Apache Kafka

By |2024-07-31T13:10:27+00:00January 30th, 2018|Uncategorized|

Apache Kafka We think of a streaming platform as having three key capabilities: It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or [...]

Apache Flume

By |2024-07-31T13:02:23+00:00January 30th, 2018|Uncategorized|

What is Apache Flume? Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The [...]

Apache Spark 

By |2024-07-31T13:18:23+00:00January 30th, 2018|Uncategorized|

What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark [...]

CONTACT US