hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <mcclou...@gmail.com>
Subject Re: Timer jobs
Date Thu, 01 Sep 2011 15:44:57 GMT
In Hadoop, if the client that triggers the job fails, is there a way to
recover and another client to submit the job?

On Thu, Sep 1, 2011 at 8:44 PM, Per Steffensen <steff@designware.dk> wrote:

> Well I am not sure I get you right, but anyway, basically I want a timer
> framework that triggers my jobs. And the triggering of the jobs need to work
> even though one or two particular machines goes down. So the "timer
> triggering mechanism" has to live in the cluster, so to speak. What I dont
> want is that the timer framework are driven from one particular machine, so
> that the triggering of jobs will not happen if this particular machine goes
> down. Basically if I have e.g. 10 machines in a Hadoop cluster I will be
> able to run e.g. MapReduce jobs even if 3 of the 10 machines are down. I
> want my timer framework to also be clustered, distributed and coordinated,
> so that I will also have my timer jobs triggered even though 3 out of 10
> machines are down.
>
>
> Regards, Per Steffensen
>
> Ronen Itkin skrev:
>
>> If I get you right you are asking about Installing Oozie as Distributed
>> and/or HA cluster?!
>> In that case I am not familiar with an out of the box solution by Oozie.
>> But, I think you can made up a solution of your own, for example:
>> Installing Oozie on two servers on the same partition which will be
>> synchronized by DRBD.
>> You can trigger a "failover" using linux Heartbeat and that way maintain a
>> virtual IP.
>>
>>
>>
>>
>>
>> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <steff@designware.dk>
>> wrote:
>>
>>
>>
>>> Hi
>>>
>>> Thanks a lot for pointing me to Oozie. I have looked a little bit into
>>> Oozie and it seems like the "component" triggering jobs is called
>>> "Coordinator Application". But I really see nowhere that this Coordinator
>>> Application doesnt just run on a single machine, and that it will
>>> therefore
>>> not trigger anything if this machine is down. Can you confirm that the
>>> "Coordinator Application"-role is distributed in a distribued Oozie
>>> setup,
>>> so that jobs gets triggered even if one or two machines are down?
>>>
>>> Regards, Per Steffensen
>>>
>>> Ronen Itkin skrev:
>>>
>>>  Hi
>>>
>>>
>>>> Try to use Oozie for job coordination and work flows.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <steff@designware.dk>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> I use hadoop for a MapReduce job in my system. I would like to have the
>>>>> job
>>>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>>>> hadoop?
>>>>> Of course I could setup a timer in an external timer framework (CRON
or
>>>>> something like that) that invokes the MapReduce job. But CRON is only
>>>>> running on one particular machine, so if that machine goes down my job
>>>>> will
>>>>> not be triggered. Then I could setup the timer on all or many machines,
>>>>> but
>>>>> I would not like the job to be run in more than one instance every 5th
>>>>> minute, so then the timer jobs would need to coordinate who is actually
>>>>> starting the job "this time" and all the rest would just have to do
>>>>> nothing.
>>>>> Guess I could come up with a solution to that - e.g. writing some
>>>>> "lock"
>>>>> stuff using HDFS files or by using ZooKeeper. But I would really like
>>>>> if
>>>>> someone had already solved the problem, and provided some kind of a
>>>>> "distributed timer framework" running in a "cluster", so that I could
>>>>> just
>>>>> register a timer job with the cluster, and then be sure that it is
>>>>> invoked
>>>>> every 5th minute, no matter if one or two particular machines in the
>>>>> cluster
>>>>> is down.
>>>>>
>>>>> Any suggestions are very welcome.
>>>>>
>>>>> Regards, Per Steffensen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
Regards,

Tharindu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message