hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Job Management - getting practical
Date Mon, 20 Jul 2009 15:01:05 GMT
Nico,

  You might want to take a look at Oozie: http://issues.apache.org/jira/browse/HADOOP-5303

.

Arun

On Jul 20, 2009, at 7:42 AM, Nico Coetzee wrote:

> Hi again,
>
> Our testing is going fantastic and I have a 4 node cluster going to  
> our
> production environment mid-August (currently testing on a 3 node  
> cluster).
>
> I am now looking more at the operational side of things and I have a  
> number
> of questions around a GUI app to manage jobs (preferably browser  
> based).
>
> Our basic requirements:
>
>
>   - Define a unique job with parameters:
>      - Data source (in our case  a file (or files) on remote systems
>      - Method of retrieving above files (most are scp, but some are  
> ftp)
>      - which cluster to push the job to (test or production cluster)
>      - Which "spool" directory to use on the hadoop master (where I  
> copy
>      the raw data before uploading to the HDFS)
>      - input and output directories to use for this job in HDFS
>      - Where the output must be dumped after processing (export from  
> HDFS)
>      - What (if any) post-processing needs to take place (upload or  
> link to
>      relevant scripts
>      - Of course also upload or link to the map and reduce scripts  
> (I still
>      use and prefer the streaming solution)
>      - Define the run schedule of the job (almost 90% of our jobs  
> will be
>      run at regular intervals - at least once per day. Think along
> crontab lines)
>      - Additional nice to have requirements:
>   - Define mail addresses of people that should get the job report  
> after it
>      was run
>      - Ability to move a job (with it's scripts etc.) from a "test"  
> cluster
>      to a "production" cluster (we test everything first before we
> put stuff in
>      production)
>      - Run certain jobs manually (once off jobs. manually re-run  
> failed
>      jobs)
>
>
> If such a system does not exist in the Open Source community I  
> wonder if
> there will be sufficient interest if I start a project like this?
>
> Thanks for your feedback and suggestions
>
> Nico


Mime
View raw message