hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Coetzee <nicc...@gmail.com>
Subject Re: Job Management - getting practical
Date Mon, 20 Jul 2009 15:35:44 GMT
Thanks - this looks like it has potential. What I'm thinking is then to
maybe just build a UI to basically "populate" or "build" the Oozie XML...



On Mon, Jul 20, 2009 at 5:01 PM, Arun C Murthy <acm@yahoo-inc.com> wrote:

> Nico,
>
>  You might want to take a look at Oozie:
> http://issues.apache.org/jira/browse/HADOOP-5303.
>
> Arun
>
>
> On Jul 20, 2009, at 7:42 AM, Nico Coetzee wrote:
>
>  Hi again,
>>
>> Our testing is going fantastic and I have a 4 node cluster going to our
>> production environment mid-August (currently testing on a 3 node cluster).
>>
>> I am now looking more at the operational side of things and I have a
>> number
>> of questions around a GUI app to manage jobs (preferably browser based).
>>
>> Our basic requirements:
>>
>>
>>  - Define a unique job with parameters:
>>     - Data source (in our case  a file (or files) on remote systems
>>     - Method of retrieving above files (most are scp, but some are ftp)
>>     - which cluster to push the job to (test or production cluster)
>>     - Which "spool" directory to use on the hadoop master (where I copy
>>     the raw data before uploading to the HDFS)
>>     - input and output directories to use for this job in HDFS
>>     - Where the output must be dumped after processing (export from HDFS)
>>     - What (if any) post-processing needs to take place (upload or link to
>>     relevant scripts
>>     - Of course also upload or link to the map and reduce scripts (I still
>>     use and prefer the streaming solution)
>>     - Define the run schedule of the job (almost 90% of our jobs will be
>>     run at regular intervals - at least once per day. Think along
>> crontab lines)
>>     - Additional nice to have requirements:
>>  - Define mail addresses of people that should get the job report after it
>>     was run
>>     - Ability to move a job (with it's scripts etc.) from a "test" cluster
>>     to a "production" cluster (we test everything first before we
>> put stuff in
>>     production)
>>     - Run certain jobs manually (once off jobs. manually re-run failed
>>     jobs)
>>
>>
>> If such a system does not exist in the Open Source community I wonder if
>> there will be sufficient interest if I start a project like this?
>>
>> Thanks for your feedback and suggestions
>>
>> Nico
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message