hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Timer jobs
Date Thu, 01 Sep 2011 09:30:43 GMT

I use hadoop for a MapReduce job in my system. I would like to have the 
job run very 5th minute. Are there any "distributed" timer job stuff in 
hadoop? Of course I could setup a timer in an external timer framework 
(CRON or something like that) that invokes the MapReduce job. But CRON 
is only running on one particular machine, so if that machine goes down 
my job will not be triggered. Then I could setup the timer on all or 
many machines, but I would not like the job to be run in more than one 
instance every 5th minute, so then the timer jobs would need to 
coordinate who is actually starting the job "this time" and all the rest 
would just have to do nothing. Guess I could come up with a solution to 
that - e.g. writing some "lock" stuff using HDFS files or by using 
ZooKeeper. But I would really like if someone had already solved the 
problem, and provided some kind of a "distributed timer framework" 
running in a "cluster", so that I could just register a timer job with 
the cluster, and then be sure that it is invoked every 5th minute, no 
matter if one or two particular machines in the cluster is down.

Any suggestions are very welcome.

Regards, Per Steffensen

View raw message