What is Apache Mahout?

Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial intelligence focused on enabling machines to learn without being explicitly programmed, and it is commonly used to improve future performance based on previous outcomes.Once big data is stored on the Hadoop Distributed File System (HDFS), Mahout provides the data science tools to automatically find meaningful patterns in those big data sets. The Apache Mahout project aims to make it faster and easier to turn big data into big information.

OVERVIEW

An algorithm library for scalable machine learning on Hadoop

Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm. Machine learning is a discipline of artificial intelligence focused on enabling machines to learn without being explicitly programmed, and it is commonly used to improve future performance based on previous outcomes.

Once big data is stored on the Hadoop Distributed File System (HDFS), Mahout provides the data science tools to automatically find meaningful patterns in those big data sets. The Apache Mahout project aims to make it faster and easier to turn big data into big information.

WHAT MAHOUT DOES

Mahout supports four main data science use cases:

Collaborative filtering – mines user behavior and makes product recommendations (e.g. Amazon recommendations)
Clustering – takes items in a particular class (such as web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other
Classification – learns from existing categorizations and then assigns unclassified items to the best category
Frequent itemset mining – analyzes items in a group (e.g. items in a shopping cart or terms in a query session) and then identifies which items typically appear together

HOW MAHOUT WORKS

Mahout provides an implementation of various machine learning algorithms, some in local mode and some in distributed mode (for use with Hadoop). Each algorithm in the Mahout library can be invoked using the Mahout command line.

The following is a list of algorithms for use in distributed mode (Hadoop-compatible), classified by the four categories: collaborative filtering, clustering, classification or frequent itemset mining. Mahout also includes some machine learning algorithms that can be used locally, but those are not listed here. For a complate list of algorithms, please visit http://mahout.apache.org/users/basics/algorithms.html.

Algorithm	Category	Description
Distributed Item-based Collaborative Filtering	Collaborative Filtering	Estimates a user’s preference for one item by looking at his/her preferences for similar items
Collaborative Filtering Using a Parallel Matrix Factorization	Collaborative Filtering	Among a matrix of items that a user has not yet seen, predict which items the user might prefer
Canopy Clustering	Clustering	For preprocessing data before using a K-means or Hierarchical clustering algorithm
Dirichlet Process Clustering	Clustering	Performs Bayesian mixture modeling
Fuzzy K-Means	Clustering	Discovers soft clusters where a particular point can belong to more than one cluster
Hierarchical Clustering	Clustering	Builds a hierarchy of clusters using either an agglomerative“bottom up” or divisive “top down” approach
K-Means Clustering	Clustering	Aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean
Latent Dirichlet Allocation	Clustering	Automatically and jointly cluster words into “topics” and documents into mixtures of topics
Mean Shift Clustering	Clustering	For finding modes or clusters in 2-dimensional space, where the number of clusters is unknown
Minhash Clustering	Clustering	For quickly estimating similarity between two data sets
Spectral Clustering	Clustering	Cluster points using eigenvectors of matrices derived from the data
Bayesian	Classification	Used to classify objects into binary categories
Random Forests	Classification	An ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees
Parallel FP Growth Algorithm	Frequent Itemset Mining	Analyzes items in a group and then identifies which items typically appear together

Source: Apache Mahout – Hortonworks

Transforming Enterprises with
Data & AI Services & Solutions.

ThirdEye delivers Data and AI services & solutions for enterprises worldwide by
leveraging state-of-the-art Data & AI technologies.

Talk to ThirdEye

WHAT MAHOUT DOES

HOW MAHOUT WORKS

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

What is Apache Mahout?

OVERVIEW

WHAT MAHOUT DOES

HOW MAHOUT WORKS

Customers

Projects

Industries

Technologies

Cloud Platforms

Transforming Enterprises with Data & AI Services & Solutions.

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

Transforming Enterprises with
Data & AI Services & Solutions.