Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CCD0A1148C for ; Wed, 16 Apr 2014 14:42:11 +0000 (UTC) Received: (qmail 62590 invoked by uid 500); 16 Apr 2014 14:42:10 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 62562 invoked by uid 500); 16 Apr 2014 14:42:08 -0000 Mailing-List: contact dev-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.apache.org Delivered-To: mailing list dev@tajo.apache.org Received: (qmail 62493 invoked by uid 99); 16 Apr 2014 14:42:06 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 14:42:06 +0000 Received: from localhost (HELO mail-qa0-f46.google.com) (127.0.0.1) (smtp-auth username hyunsik, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 14:42:06 +0000 Received: by mail-qa0-f46.google.com with SMTP id i13so10624740qae.19 for ; Wed, 16 Apr 2014 07:42:05 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.224.57.142 with SMTP id c14mr5483070qah.23.1397659325077; Wed, 16 Apr 2014 07:42:05 -0700 (PDT) Received: by 10.96.144.69 with HTTP; Wed, 16 Apr 2014 07:42:04 -0700 (PDT) In-Reply-To: <5D45DDD8-A8CF-44BB-9710-73254F393F30@aol.com> References: <3155BB65-E89C-4B39-91B2-1E6B0DB6350C@aol.com> <5D45DDD8-A8CF-44BB-9710-73254F393F30@aol.com> Date: Wed, 16 Apr 2014 23:42:04 +0900 Message-ID: Subject: Re: JIRA-704 : TajoMaster High Availability . From: Hyunsik Choi To: "dev@tajo.apache.org" Content-Type: multipart/alternative; boundary=089e01536d4a96a24b04f729eb4c --089e01536d4a96a24b04f729eb4c Content-Type: text/plain; charset=UTF-8 I'm sorry for late response, and thank you Alvin for your understanding. Best Regards, Hyunsik On Wed, Apr 16, 2014 at 11:19 PM, Alvin Henrick wrote: > Hi All , > Not a problem. I wasn't aware that 704 was overlapping with > 611.Yes, I was planning to use Apache Curator as well and did the small POC > and posted on Github. Apache Curator has the service discovery recipe which > we can use. > As per hyunsik the only work left on 704 is Catalog > replication across TajoMaster's which can be easily achieved via database > replication. > > Xuhui and Min , > Let me know If I can help because I have > done some good research on Apache Curator and Zookeeper (How to > utilize/configure apache curator api's ). > Here is the Git repository where I did > some work git@github.com:alvinhenrick/zooKeeper-poc.git for 704 before > getting into the real implementation. > > I will remove the in progress status and associate 704 with > 611 and move onto tackle another interesting/priority issue :). Let me know > guys how do you wan't to tackle this so that we don't duplicate the effort. > > Have a wonderful day!!! > > Thanks! > Warm Regards, > Alvin. > > > On Apr 16, 2014, at 6:56 AM, Hyunsik Choi wrote: > > > Hi Alvin, > > > > First of all, thank you Alvin for your contribution. Your proposal looks > > nice and reasonable for me. > > > > BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be somewhat > > overlapped to each other. We need to arrange the tasks to avoid > duplicated > > works. > > > > In my opinion, TajoMaster HA feature involves three sub features: > > 1) Leader election of multiple TajoMasters - One of multiple TajoMasters > > always is the leader TajoMaster. > > 2) Service discovery of TajoClient side - TajoClient API call should be > > resilient even though the original TajoMaster is not available. > > 3) Cluster resource management and Catalog information that TajoMaster > > keeps in main-memory. - the information should not be lost. > > > > I think that (1) and (2) are duplicated to TAJO-611 for service > discovery. > > So, it would be nice if TAJO-704 should only focus on (3). It's because > > TAJO-611 already started few weeks ago and TAJO-704 may be the relatively > > earlier stage. *Instead, you can continue the work with Xuhui and Min.* > > Someone can divide the service discovery issue into more subtasks. > > > > In addition, I'd like to more discuss (3). Currently, a running > TajoMaster > > keeps two information: cluster resource information of all workers and > > catalog information. In order to guarantee the HA of the data, TajoMaster > > should either persistently materialize them or consistently synchronize > > them across multiple TajoMasters. BTW, we will replace the resource > > management feature of TajoMaster into a decentralized manner in new > > scheduler issue. As a result, I think that TajoMaster HA needs to focus > on > > only the high availability of catalog information. The HA of catalog can > be > > easily achieved by database replication or we can make our own module for > > it. In my view, I prefer the former. > > > > Hi Xuhui and Min, > > > > Could you share the brief progress of service discovery issue? If so, we > > can easily figure out how we start the service discovery together. > > > > Warm regards, > > Hyunsik > > > > > > > > On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou wrote: > > > >> Actually, we are not only thinking about the HA, but also service > discovery > >> when the future tajo scheduler would rely on. Tajo scheduler can get > all > >> the active workers from that service. > >> > >> > >> Regards, > >> Min > >> > >> > >> On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu wrote: > >> > >>> Hi Alvin, > >>> > >>> TAJO-611 will introduce Curator as a service discovery service to Tajo > >> and > >>> Curator is based on ZK. Maybe we can work together. > >>> > >>> Thanks, > >>> Xuhui > >>> > >>> > >>> On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou > wrote: > >>> > >>>> HI Alvin, > >>>> > >>>> I think this jira has somewhat overlap with TAJO-611, can you have > >> some > >>>> cooperation? > >>>> > >>>> Thanks, > >>>> Min > >>>> > >>>> > >>>> On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra < > >> henry.saputra@gmail.com > >>>>> wrote: > >>>> > >>>>> Jaehwa, I think we should think about pluggable mechanism that would > >>>>> allow some kind distributed system like ZK to be used if wanted. > >>>>> > >>>>> - Henry > >>>>> > >>>>> On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung > >>>> wrote: > >>>>>> Hi, Alvin > >>>>>> > >>>>>> I'm sorry for late response, and thank you very much for your > >>>>> contribution. > >>>>>> I agree with your opinion for zookeeper. But, zookeeper requires an > >>>>>> additional dependency that someone does not want. > >>>>>> > >>>>>> I'd like to suggest adding an abstraction layer for handling > >>> TajoMaster > >>>>> HA. > >>>>>> When I had created TAJO-740, I wished that TajoMaster HA would > >> have a > >>>>>> generic interface and a basic implementation using HDFS. Next, your > >>>>>> proposed zookeeper implementation will be added there. It will > >> allow > >>>>> users > >>>>>> to choice their desired implementation according to their > >>> environments. > >>>>>> > >>>>>> In addition, I'd like to propose that TajoMaster embeds the HA > >>> module, > >>>>> and > >>>>>> it would be great if HA works well by launching a backup > >> TajoMaster. > >>>>>> Deploying additional process besides TajoMaster and TajoWorker > >>>> processes > >>>>>> may give more burden to users. > >>>>>> > >>>>>> *Cheers* > >>>>>> *Jaehwa* > >>>>>> > >>>>>> > >>>>>> 2014-04-13 14:36 GMT+09:00 Jihoon Son : > >>>>>> > >>>>>>> Hi Alvin. > >>>>>>> Thanks for your suggestion. > >>>>>>> > >>>>>>> In overall, your suggestion looks very reasonable to me! > >>>>>>> I'll check the POC. > >>>>>>> > >>>>>>> Many thanks, > >>>>>>> Jihoon > >>>>>>> Hi All , > >>>>>>> After doing lot of research in my opinion we should > >>>> utilize > >>>>>>> zookeeper for Tajo Master HA.I have created a small POC and shared > >>> it > >>>>> on my > >>>>>>> Github repository ( git@github.com: > >> alvinhenrick/zooKeeper-poc.git). > >>>>>>> > >>>>>>> Just to make things little bit easier and > >> maintainable I > >>>> am > >>>>>>> utilizing Apache Curator the Fluent Zookeeper Client API > >> developed > >>> at > >>>>>>> Netflix and is now part of an apache open source project. > >>>>>>> > >>>>>>> I have attached the diagram to convey my message to > >> the > >>>> team > >>>>>>> members.Will upload it to JIRA once everyone agree with the > >> proposed > >>>>>>> solution. > >>>>>>> > >>>>>>> Here is the flow going to look like. > >>>>>>> > >>>>>>> TajoMasterZkController ==> > >>>>>>> > >>>>>>> > >>>>>>> 1. This component will start and connect to zookeeper quorum > >> and > >>>>> fight > >>>>>>> ( :) ) to obtain the latch / lock to become the master . > >>>>>>> 2. Once the lock is obtained the Apache Curator API will > >>> invoke > >>>>>>> takeLeadership () method at this time will start the > >>> TajoMaster. > >>>>>>> 3. As long as the TajoMaster is running the Controller will > >>> keep > >>>>> the > >>>>>>> lock and update the meta data on zookeeper server with the > >>>>>>> HOSTNAME and RPC > >>>>>>> PORT. > >>>>>>> 4. The other participant will keep waiting for the latch/ > >> lock > >>>> to > >>>>> be > >>>>>>> released by zookeeper to obtain the leadership. > >>>>>>> 5. The advantage is we can have as many Tajo Master's as we > >>>> wan't > >>>>> but > >>>>>>> only one can be the leader and will consume the resources > >> only > >>>>> after > >>>>>>> obtaining the latch/lock. > >>>>>>> > >>>>>>> > >>>>>>> TajoWorkerZkController ==> > >>>>>>> > >>>>>>> 1. This component will start and connect to zookeeper (will > >>> create > >>>>>>> EPHEMERAL ZNODE) and wait for the events from zookeeper. > >>>>>>> 2. The first listener will listener for successful > >>> registration. > >>>>>>> 3. The second listener on master node will listen for any > >>>>> changes to > >>>>>>> the master node received from zookeeper server. > >>>>>>> 4. If the failover occurs the data on the master ZNODE will > >>> be > >>>>>>> changed and the new HOSTNAME and RPC PORT can be obtained > >> and > >>>> the > >>>>>>> TajoWorker can establish the new RPC connection with the > >>>>> TajoMaster. > >>>>>>> > >>>>>>> To demonstrate I have created the small Readme.txt file > >>>>>>> on Github on how to run the example. Please read the log > >> statements > >>> on > >>>>> the > >>>>>>> console. > >>>>>>> > >>>>>>> Similar to TajoWorkerZkController we can also > >>>>>>> implement TajoClientZkController. > >>>>>>> > >>>>>>> Any help or advice is appreciated. > >>>>>>> > >>>>>>> Thanks! > >>>>>>> Warm Regards, > >>>>>>> Alvin. > >>>>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> My research interests are distributed systems, parallel computing and > >>>> bytecode based virtual machine. > >>>> > >>>> My profile: > >>>> http://www.linkedin.com/in/coderplay > >>>> My blog: > >>>> http://coderplay.javaeye.com > >>>> > >>> > >> > >> > >> > >> -- > >> My research interests are distributed systems, parallel computing and > >> bytecode based virtual machine. > >> > >> My profile: > >> http://www.linkedin.com/in/coderplay > >> My blog: > >> http://coderplay.javaeye.com > >> > > --089e01536d4a96a24b04f729eb4c--