Return-Path: X-Original-To: apmail-incubator-oozie-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-oozie-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D406F8C34 for ; Thu, 1 Sep 2011 13:48:06 +0000 (UTC) Received: (qmail 59870 invoked by uid 500); 1 Sep 2011 13:48:06 -0000 Delivered-To: apmail-incubator-oozie-dev-archive@incubator.apache.org Received: (qmail 59834 invoked by uid 500); 1 Sep 2011 13:48:06 -0000 Mailing-List: contact oozie-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oozie-dev@incubator.apache.org Delivered-To: mailing list oozie-dev@incubator.apache.org Received: (qmail 59817 invoked by uid 99); 1 Sep 2011 13:48:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2011 13:48:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tucu@cloudera.com designates 209.85.161.175 as permitted sender) Received: from [209.85.161.175] (HELO mail-gx0-f175.google.com) (209.85.161.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2011 13:48:01 +0000 Received: by gxk3 with SMTP id 3so1268939gxk.6 for ; Thu, 01 Sep 2011 06:47:40 -0700 (PDT) Received: by 10.42.142.70 with SMTP id r6mr201226icu.123.1314884859901; Thu, 01 Sep 2011 06:47:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.223.202 with HTTP; Thu, 1 Sep 2011 06:47:05 -0700 (PDT) In-Reply-To: References: <4E5F50C3.20600@designware.dk> <4E5F6577.6040409@designware.dk> From: Alejandro Abdelnur Date: Thu, 1 Sep 2011 06:47:05 -0700 Message-ID: Subject: Re: Timer jobs To: oozie-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8fc4feb89504abe17c84 --90e6ba6e8fc4feb89504abe17c84 Content-Type: text/plain; charset=ISO-8859-1 [moving common-user@ to BCC] Oozie is not HA yet. But it would be relatively easy to make it. It was designed with that in mind, we even did a prototype. Oozie consists of 2 services, a SQL database to store the Oozie jobs state and a servlet container where Oozie app proper runs. The solution for HA for the database, well, it is left to the database. This means, you'll have to get an HA DB. The solution for HA for the Oozie app is deploying the servlet container with the Oozie app in more than one box (2 or 3); and front them by a HTTP load-balancer. The missing part is that the current Oozie lock-service is currently an in-memory implementation. This should be replaced with a Zookeeper implementation. Zookeeper could run externally or internally in all Oozie servers. This is what was prototyped long ago. Thanks. Alejandro On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin wrote: > If I get you right you are asking about Installing Oozie as Distributed > and/or HA cluster?! > In that case I am not familiar with an out of the box solution by Oozie. > But, I think you can made up a solution of your own, for example: > Installing Oozie on two servers on the same partition which will be > synchronized by DRBD. > You can trigger a "failover" using linux Heartbeat and that way maintain a > virtual IP. > > > > > > On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen > wrote: > > > Hi > > > > Thanks a lot for pointing me to Oozie. I have looked a little bit into > > Oozie and it seems like the "component" triggering jobs is called > > "Coordinator Application". But I really see nowhere that this Coordinator > > Application doesnt just run on a single machine, and that it will > therefore > > not trigger anything if this machine is down. Can you confirm that the > > "Coordinator Application"-role is distributed in a distribued Oozie > setup, > > so that jobs gets triggered even if one or two machines are down? > > > > Regards, Per Steffensen > > > > Ronen Itkin skrev: > > > > Hi > >> > >> Try to use Oozie for job coordination and work flows. > >> > >> > >> > >> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen > >> wrote: > >> > >> > >> > >>> Hi > >>> > >>> I use hadoop for a MapReduce job in my system. I would like to have the > >>> job > >>> run very 5th minute. Are there any "distributed" timer job stuff in > >>> hadoop? > >>> Of course I could setup a timer in an external timer framework (CRON or > >>> something like that) that invokes the MapReduce job. But CRON is only > >>> running on one particular machine, so if that machine goes down my job > >>> will > >>> not be triggered. Then I could setup the timer on all or many machines, > >>> but > >>> I would not like the job to be run in more than one instance every 5th > >>> minute, so then the timer jobs would need to coordinate who is actually > >>> starting the job "this time" and all the rest would just have to do > >>> nothing. > >>> Guess I could come up with a solution to that - e.g. writing some > "lock" > >>> stuff using HDFS files or by using ZooKeeper. But I would really like > if > >>> someone had already solved the problem, and provided some kind of a > >>> "distributed timer framework" running in a "cluster", so that I could > >>> just > >>> register a timer job with the cluster, and then be sure that it is > >>> invoked > >>> every 5th minute, no matter if one or two particular machines in the > >>> cluster > >>> is down. > >>> > >>> Any suggestions are very welcome. > >>> > >>> Regards, Per Steffensen > >>> > >>> > >>> > >> > >> > >> > >> > >> > > > > > > > -- > * > Ronen Itkin* > Taykey | www.taykey.com > --90e6ba6e8fc4feb89504abe17c84--