Apache Storm

OVERVIEW

A system for processing streaming data in real time

Apache™ Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine learning and continuous monitoring of operations.

Storm integrates with YARN via Apache Slider, YARN manages Storm while also considering cluster resources for data governance, security and operations components of a modern data architecture.

WHAT STORM DOES

Storm is a distributed real-time computation system for processing large volumes of high-velocity data. Storm is extremely fast, with the ability to process over a million records per second per node on a cluster of modest size. Enterprises harness this speed and combine it with other data access applications in Hadoop to prevent undesirable events or to optimize positive outcomes.

Some of specific new business opportunities include: real-time customer service management, data monetization, operational dashboards, or cyber security analytics and threat detection.

Here are some typical “prevent” and “optimize” use cases for Storm.

	“Prevent” Use Cases	“Optimize” Use Cases
Financial Services	Securities fraud Operational risks & compliance violations	Order routing Pricing
Telecom	Security breaches Network outages	Bandwidth allocation Customer service
Retail	Shrinkage Stock outs	Offers Pricing
Manufacturing	Preventative maintenance Quality assurance	Supply chain optimization Reduced plant downtime
Transportation	Driver monitoring Predictive maintenance	Routes Pricing
Web	Application failures Operational issues	Personalized content

Storm is simple and developers can write Storm topologies using any programming language.

Five characteristics make Storm ideal for real-time data processing workloads. Storm is:

Fast – benchmarked as processing one million 100 byte messages per second per node
Scalable – with parallel calculations that run across a cluster of machines
Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node.
Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures.
Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate.
HOW STORM WORKS

A storm cluster has three sets of nodes:

Nimbus node (master node, similar to the Hadoop JobTracker):

Uploads computations for execution

Distributes code across the cluster

Launches workers across the cluster

Monitors computation and reallocates workers as needed

ZooKeeper nodes – coordinates the Storm cluster

Supervisor nodes – communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus

Five key abstractions help to understand how Storm processes data:

Tuples– an ordered list of elements. For example, a “4-tuple” might be (7, 1, 3, 7)

Streams – an unbounded sequence of tuples.

Spouts –sources of streams in a computation (e.g. a Twitter API)

Bolts – process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases.

Topologies – the overall calculation, represented visually as a network of spouts and bolts (as in the following diagram)

Storm users define topologies for how to process the data when it comes streaming in from the spout. When the data comes in, it is processed and the results are passed into Hadoop.

Learn more about how the community is working to integrate Storm with Hadoop and improve its readiness for the enterprise

Source: Apache Storm – Hortonworks

Apache Storm

OVERVIEW

WHAT STORM DOES

HOW STORM WORKS

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

Apache Storm

OVERVIEW

WHAT STORM DOES

HOW STORM WORKS

Customers

Projects

Industries

Technologies

Cloud Platforms

Transforming Enterprises with Data & AI Services & Solutions.

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

Transforming Enterprises with
Data & AI Services & Solutions.