hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5303) Hadoop Workflow System (HWS)
Date Tue, 24 Feb 2009 13:14:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676260#action_12676260

Steve Loughran commented on HADOOP-5303:

Tom -there's good reasons for not using Ant as a workflow system, even though it is a great
build tool with good support from IDEs and CI systems

# not that declarative; you need to understand every task to determine the inputs and outputs.
Compare with MSBuild, which is harder to write but easier for IDEs to work with.
# no stable schema; very hard for other analysis tools to look a build and decide what happens
# no HA operation. Every task manages its state in member variables, no way to handle outages
other than full restart
# file system is biased towards local system only. No good if you want to run the operations
elsewhere in the cluster
# implicit bias against long-lived operations. The Eclipse team do complain if Ant leaks memory
over time, but there are some assumptions that builds finish in minutes, not days -and some
of Ant's datastructures are based on those assumptions
# no failure handling. Failures = halt the build and tell the developer they have something
to fix. Workflows have different goals
# no formal foundation on a par with Hoare's CSP work, which was, once upon a time, what BPEL
was based on.

The Ant tasks for Hadoop aren't that complex, dont have much in the way of testing and rely
on DFSClient, which abuses System.out in ways Ant wont like (Ant puts its own one up that
buffers different threads up to line endings). They shouldn't be a reason to stay with Ant.

> Hadoop Workflow System (HWS)
> ----------------------------
>                 Key: HADOOP-5303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5303
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: hws-preso-v1_0_2009FEB22.pdf, hws-v1_0_2009FEB22.pdf
> This is a proposal for a system specialized in running Hadoop/Pig jobs in a control dependency
DAG (Direct Acyclic Graph), a Hadoop workflow application.
> Attached there is a complete specification and a high level overview presentation.
> ----
> *Highlights* 
> A Workflow application is DAG that coordinates the following types of actions: Hadoop,
Pig, Ssh, Http, Email and sub-workflows. 
> Flow control operations within the workflow applications can be done using decision,
fork and join nodes. Cycles in workflows are not supported.
> Actions and decisions can be parameterized with job properties, actions output (i.e.
Hadoop counters, Ssh key/value pairs output) and file information (file exists, file size,
etc). Formal parameters are expressed in the workflow definition as {{${VAR}}} variables.
> A Workflow application is a ZIP file that contains the workflow definition (an XML file),
all the necessary files to run all the actions: JAR files for Map/Reduce jobs, shells for
streaming Map/Reduce jobs, native libraries, Pig scripts, and other resource files.
> Before running a workflow job, the corresponding workflow application must be deployed
in HWS.
> Deploying workflow application and running workflow jobs can be done via command line
tools, a WS API and a Java API.
> Monitoring the system and workflow jobs can be done via a web console, command line tools,
a WS API and a Java API.
> When submitting a workflow job, a set of properties resolving all the formal parameters
in the workflow definitions must be provided. This set of properties is a Hadoop configuration.
> Possible states for a workflow jobs are: {{CREATED}}, {{RUNNING}}, {{SUSPENDED}}, {{SUCCEEDED}},
{{KILLED}} and {{FAILED}}.
> In the case of a action failure in a workflow job, depending on the type of failure,
HWS will attempt automatic retries, it will request a manual retry or it will fail the workflow
> HWS can make HTTP callback notifications on action start/end/failure events and workflow
end/failure events.
> In the case of workflow job failure, the workflow job can be resubmitted skipping previously
completed actions. Before doing a resubmission the workflow application could be updated with
a patch to fix a problem in the workflow application code.
> ----

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message