Apache Mahout
What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]
What is Apache Mahout? Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial [...]
Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The “What”: Flink from [...]
Apache ZooKeeper Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration [...]
Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL [...]
What is Apache Flume? Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The [...]
What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark [...]
What is Apache Pig? Apache Pig is a high-level language platform developed to execute queries on huge datasets that are stored in HDFS using Apache Hadoop. It is similar to SQL [...]
Apache Hadoop The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters [...]
Your organization depends on ready access to its information resources. That’s probably part of the reason you decided to adopt cloud solutions, such as Amazon Web Services: Not only are [...]