OVERVIEW

The blueprint for Enterprise Hadoop includes Apache™ Hadoop’s original data storage and data processing layers and also adds components for services that enterprises must have in a modern data architecture: data integration and governance, security and operations. Apache Oozie provides some of the operational services for a Hadoop cluster, specifically around job scheduling within the cluster.

WHAT OOZIE DOES

Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work. It is integrated with the Hadoop stack, with YARN as its architectural center, and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie can also schedule jobs specific to a system, like Java programs or shell scripts.

Apache Oozie is a tool for Hadoop operations that allows cluster administrators to build complex data transformations out of multiple component tasks. This provides greater control over jobs and also makes it easier to repeat those jobs at predetermined intervals. At its core, Oozie helps administrators derive more value from Hadoop.

There are two basic types of Oozie jobs:

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute. The Workflow job has to wait
Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.

Oozie Bundle provides a way to package multiple coordinator and workflow jobs and to manage the lifecycle of those jobs.

HOW OOZIE WORKS

An Oozie Workflow is a collection of actions arranged in a Directed Acyclic Graph (DAG) . Control nodes define job chronology, setting rules for beginning and ending a workflow. In this way, Oozie controls the workflow execution path with decision, fork and join nodes. Action nodes trigger the execution of tasks.

Oozie triggers workflow actions, but Hadoop MapReduce executes them. This allows Oozie to leverage other capabilities within the Hadoop stack to balance loads and handle failures.

Oozie detects completion of tasks through callback and polling. When Oozie starts a task, it provides a unique callback HTTP URL to the task, thereby notifying that URL when it’s complete. If the task fails to invoke the callback URL, Oozie can poll the task for completion.

Often it is necessary to run Oozie workflows on regular time intervals, but in coordination with unpredictable levels of data availability or events. In these circumstances, Oozie Coordinator allows you to model workflow execution triggers in the form of the data, time or event predicates. The workflow job is started after those predicates are satisfied.

Oozie Coordinator can also manage multiple workflows that are dependent on the outcome of subsequent workflows. The outputs of subsequent workflows become the input to the next workflow. This chain is called a “data application pipeline”.

Oozie Workflow – Directed Acyclic Graph of Jobs:

Oozie Workflow Example:

Oozie Workflow Example

 
foo.com:9001
hdfs://bar.com:9000
 

 mapred.input.dir
 ${inputDir}</value,>
 

 mapred.output.dir
  ${outputDir}

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

Start/end/kill
Decision
Fork/join

Action Nodes:

Map-reduce
Pig
HDFS
Sub-workflow
Java – Run custom Java code

Oozie Workflow Application:

Workflow application is a ZIP file that includes the workflow definition and the necessary files to run all the actions. It contains the following files:
- Configuration file – config-default.xml
- App files – lib/ directory with JAR and SO files
- Pig scripts
Application Deployment:
```
$ hadoop fs-put wordcount-wf hdfs://bar.com:9000/usr/abc/wordcount
```
Workflow Job Parameters:
```
$ cat job.properites
Oozie.wf.application.path=hdfs://bar.com:9000/usr/abc/wordcount
Input=/usr/abc/input-data
Output=/usr/abc/output-data
```
Job Execution:
```
$ oozie job –run –config job.properties
Job:1-20090525161321-oozie-xyz-W
```

Source: Apache Oozie – Hortonworks , Brief Introduction to Oozie | Edureka.co

OVERVIEW

WHAT OOZIE DOES

HOW OOZIE WORKS

Oozie Workflow – Directed Acyclic Graph of Jobs:

Oozie Workflow Example:

Workflow Definition:

Oozie Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

OVERVIEW

WHAT OOZIE DOES

HOW OOZIE WORKS

Oozie Workflow – Directed Acyclic Graph of Jobs:

Oozie Workflow Example:

Workflow Definition:

Oozie Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Customers

Projects

Industries

Technologies

Cloud Platforms

Transforming Enterprises with Data & AI Services & Solutions.

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

Transforming Enterprises with
Data & AI Services & Solutions.