hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Coetzee <nicc...@gmail.com>
Subject Job Management - getting practical
Date Mon, 20 Jul 2009 14:42:56 GMT
Hi again,

Our testing is going fantastic and I have a 4 node cluster going to our
production environment mid-August (currently testing on a 3 node cluster).

I am now looking more at the operational side of things and I have a number
of questions around a GUI app to manage jobs (preferably browser based).

Our basic requirements:

   - Define a unique job with parameters:
      - Data source (in our case  a file (or files) on remote systems
      - Method of retrieving above files (most are scp, but some are ftp)
      - which cluster to push the job to (test or production cluster)
      - Which "spool" directory to use on the hadoop master (where I copy
      the raw data before uploading to the HDFS)
      - input and output directories to use for this job in HDFS
      - Where the output must be dumped after processing (export from HDFS)
      - What (if any) post-processing needs to take place (upload or link to
      relevant scripts
      - Of course also upload or link to the map and reduce scripts (I still
      use and prefer the streaming solution)
      - Define the run schedule of the job (almost 90% of our jobs will be
      run at regular intervals - at least once per day. Think along
crontab lines)
      - Additional nice to have requirements:
   - Define mail addresses of people that should get the job report after it
      was run
      - Ability to move a job (with it's scripts etc.) from a "test" cluster
      to a "production" cluster (we test everything first before we
put stuff in
      - Run certain jobs manually (once off jobs. manually re-run failed

If such a system does not exist in the Open Source community I wonder if
there will be sufficient interest if I start a project like this?

Thanks for your feedback and suggestions


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message