tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuhui Liu <maf...@gmail.com>
Subject Re: JIRA-704 : TajoMaster High Availability .
Date Thu, 17 Apr 2014 08:07:08 GMT
Hi Guys,

I was distracted by other things in the last several weeks, so I haven't
start integrating ZK yet. It will be great if Alvin can do this.

As for the service discovery, the code and unit test for service discovery
has been done. I'll give a detailed update later as well as the patch.

Thanks,
Xuhui


On Thu, Apr 17, 2014 at 1:22 AM, Min Zhou <coderplay@gmail.com> wrote:

> Hi Alvin,
>
> Thank you for your understanding.
>
>
> Xuhui,
> Could you please share your design and current progress here?
>
>
> Thanks,
> Min
>
>
>
>
> On Wed, Apr 16, 2014 at 7:42 AM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
> > I'm sorry for late response, and thank you Alvin for your understanding.
> >
> > Best Regards,
> > Hyunsik
> >
> >
> > On Wed, Apr 16, 2014 at 11:19 PM, Alvin Henrick <share.code@aol.com>
> > wrote:
> >
> > > Hi All ,
> > >              Not a problem. I wasn't aware that 704 was overlapping
> with
> > > 611.Yes, I was planning to use Apache Curator as well and did the small
> > POC
> > > and posted on Github. Apache Curator has the service discovery recipe
> > which
> > > we can use.
> > >              As per hyunsik the only work left on 704 is Catalog
> > > replication across TajoMaster's which can be easily achieved via
> database
> > > replication.
> > >
> > >       Xuhui and Min ,
> > >                                 Let me know If I can help because I
> have
> > > done some good research on Apache Curator and Zookeeper (How to
> > > utilize/configure apache curator api's ).
> > >                                 Here is the Git repository where I did
> > > some work git@github.com:alvinhenrick/zooKeeper-poc.git for 704 before
> > > getting into the real implementation.
> > >
> > >               I will remove the in progress status and associate 704
> with
> > > 611 and move onto tackle another interesting/priority issue :). Let me
> > know
> > > guys how do you wan't to tackle this so that we don't duplicate the
> > effort.
> > >
> > >               Have a wonderful day!!!
> > >
> > > Thanks!
> > > Warm Regards,
> > > Alvin.
> > >
> > >
> > > On Apr 16, 2014, at 6:56 AM, Hyunsik Choi wrote:
> > >
> > > > Hi Alvin,
> > > >
> > > > First of all, thank you Alvin for your contribution. Your proposal
> > looks
> > > > nice and reasonable for me.
> > > >
> > > > BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be
> somewhat
> > > > overlapped to each other. We need to arrange the tasks to avoid
> > > duplicated
> > > > works.
> > > >
> > > > In my opinion, TajoMaster HA feature involves three sub features:
> > > >  1) Leader election of multiple TajoMasters - One of multiple
> > TajoMasters
> > > > always is the leader TajoMaster.
> > > >  2) Service discovery of TajoClient side - TajoClient API call should
> > be
> > > > resilient even though the original TajoMaster is not available.
> > > >  3) Cluster resource management and Catalog information that
> TajoMaster
> > > > keeps in main-memory. - the information should not be lost.
> > > >
> > > > I think that (1) and (2) are duplicated to TAJO-611 for service
> > > discovery.
> > > > So, it would be nice if TAJO-704 should only focus on (3). It's
> because
> > > > TAJO-611 already started few weeks ago and TAJO-704 may be the
> > relatively
> > > > earlier stage. *Instead, you can continue the work with Xuhui and
> Min.*
> > > > Someone can divide the service discovery issue into more subtasks.
> > > >
> > > > In addition, I'd like to more discuss (3). Currently, a running
> > > TajoMaster
> > > > keeps two information: cluster resource information of all workers
> and
> > > > catalog information. In order to guarantee the HA of the data,
> > TajoMaster
> > > > should either persistently materialize them or consistently
> synchronize
> > > > them across multiple TajoMasters. BTW, we will replace the resource
> > > > management feature of TajoMaster into a decentralized manner in new
> > > > scheduler issue. As a result, I think that TajoMaster HA needs to
> focus
> > > on
> > > > only the high availability of catalog information. The HA of catalog
> > can
> > > be
> > > > easily achieved by database replication or we can make our own module
> > for
> > > > it. In my view, I prefer the former.
> > > >
> > > > Hi Xuhui and Min,
> > > >
> > > > Could you share the brief progress of service discovery issue? If so,
> > we
> > > > can easily figure out how we start the service discovery together.
> > > >
> > > > Warm regards,
> > > > Hyunsik
> > > >
> > > >
> > > >
> > > > On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <coderplay@gmail.com>
> wrote:
> > > >
> > > >> Actually, we are not only thinking about the HA, but also service
> > > discovery
> > > >> when the future tajo scheduler would rely on.  Tajo scheduler can
> get
> > > all
> > > >> the active workers from that service.
> > > >>
> > > >>
> > > >> Regards,
> > > >> Min
> > > >>
> > > >>
> > > >> On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <mafish@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Alvin,
> > > >>>
> > > >>> TAJO-611 will introduce Curator as a service discovery service
to
> > Tajo
> > > >> and
> > > >>> Curator is based on ZK. Maybe we can work together.
> > > >>>
> > > >>> Thanks,
> > > >>> Xuhui
> > > >>>
> > > >>>
> > > >>> On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <coderplay@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> HI Alvin,
> > > >>>>
> > > >>>> I think this jira has somewhat overlap with TAJO-611,  can
you
> have
> > > >> some
> > > >>>> cooperation?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Min
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra <
> > > >> henry.saputra@gmail.com
> > > >>>>> wrote:
> > > >>>>
> > > >>>>> Jaehwa, I think we should think about pluggable mechanism
that
> > would
> > > >>>>> allow some kind distributed system like ZK to be used
if wanted.
> > > >>>>>
> > > >>>>> - Henry
> > > >>>>>
> > > >>>>> On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <
> blrunner@apache.org>
> > > >>>> wrote:
> > > >>>>>> Hi, Alvin
> > > >>>>>>
> > > >>>>>> I'm sorry for late response, and thank you very much
for your
> > > >>>>> contribution.
> > > >>>>>> I agree with your opinion for zookeeper. But, zookeeper
requires
> > an
> > > >>>>>> additional dependency that someone does not want.
> > > >>>>>>
> > > >>>>>> I'd like to suggest adding an abstraction layer for
handling
> > > >>> TajoMaster
> > > >>>>> HA.
> > > >>>>>> When I had created TAJO-740, I wished that TajoMaster
HA would
> > > >> have a
> > > >>>>>> generic interface and a basic implementation using
HDFS. Next,
> > your
> > > >>>>>> proposed zookeeper implementation will be added there.
It will
> > > >> allow
> > > >>>>> users
> > > >>>>>> to choice their desired implementation according to
their
> > > >>> environments.
> > > >>>>>>
> > > >>>>>> In addition, I'd like to propose that TajoMaster embeds
the HA
> > > >>> module,
> > > >>>>> and
> > > >>>>>> it would be great if HA works well by launching a
backup
> > > >> TajoMaster.
> > > >>>>>> Deploying additional process besides TajoMaster and
TajoWorker
> > > >>>> processes
> > > >>>>>> may give more burden to users.
> > > >>>>>>
> > > >>>>>> *Cheers*
> > > >>>>>> *Jaehwa*
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> 2014-04-13 14:36 GMT+09:00 Jihoon Son <jihoonson@apache.org>:
> > > >>>>>>
> > > >>>>>>> Hi Alvin.
> > > >>>>>>> Thanks for your suggestion.
> > > >>>>>>>
> > > >>>>>>> In overall, your suggestion looks very reasonable
to me!
> > > >>>>>>> I'll check the POC.
> > > >>>>>>>
> > > >>>>>>> Many thanks,
> > > >>>>>>> Jihoon
> > > >>>>>>> Hi All ,
> > > >>>>>>>            After doing lot of research in my opinion
we should
> > > >>>> utilize
> > > >>>>>>> zookeeper for Tajo Master HA.I have created a
small POC and
> > shared
> > > >>> it
> > > >>>>> on my
> > > >>>>>>> Github repository ( git@github.com:
> > > >> alvinhenrick/zooKeeper-poc.git).
> > > >>>>>>>
> > > >>>>>>>            Just to make things little bit easier
and
> > > >> maintainable I
> > > >>>> am
> > > >>>>>>> utilizing Apache Curator the Fluent Zookeeper
Client API
> > > >> developed
> > > >>> at
> > > >>>>>>> Netflix and is now part of an  apache open source
project.
> > > >>>>>>>
> > > >>>>>>>            I have attached the diagram to convey
my message to
> > > >> the
> > > >>>> team
> > > >>>>>>> members.Will upload it to JIRA once everyone agree
with the
> > > >> proposed
> > > >>>>>>> solution.
> > > >>>>>>>
> > > >>>>>>>            Here is the flow going to look like.
> > > >>>>>>>
> > > >>>>>>>            TajoMasterZkController   ==>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   1. This component  will start and connect to
zookeeper quorum
> > > >> and
> > > >>>>> fight
> > > >>>>>>>      ( :) ) to obtain the latch / lock to become
the master .
> > > >>>>>>>      2. Once the lock is obtained the Apache Curator
API will
> > > >>> invoke
> > > >>>>>>>      takeLeadership () method at this time will
start the
> > > >>> TajoMaster.
> > > >>>>>>>      3. As long as the TajoMaster is running the
Controller
> will
> > > >>> keep
> > > >>>>> the
> > > >>>>>>>      lock and update the meta data on zookeeper
server with the
> > > >>>>>>> HOSTNAME and RPC
> > > >>>>>>>      PORT.
> > > >>>>>>>      4. The other participant will keep waiting
for the latch/
> > > >> lock
> > > >>>> to
> > > >>>>> be
> > > >>>>>>>      released by zookeeper to obtain the leadership.
> > > >>>>>>>      5. The advantage is we can have as many Tajo
Master's as
> we
> > > >>>> wan't
> > > >>>>> but
> > > >>>>>>>      only one can be the leader and will consume
the resources
> > > >> only
> > > >>>>> after
> > > >>>>>>>      obtaining the latch/lock.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>           TajoWorkerZkController ==>
> > > >>>>>>>
> > > >>>>>>>   1. This component  will start and connect to
zookeeper (will
> > > >>> create
> > > >>>>>>>      EPHEMERAL ZNODE) and wait for the events
from zookeeper.
> > > >>>>>>>      2. The first listener will listener for successful
> > > >>> registration.
> > > >>>>>>>      3. The second listener on master node will
listen for any
> > > >>>>> changes to
> > > >>>>>>>      the master node received from zookeeper server.
> > > >>>>>>>      4.  If the failover occurs the data on the
master ZNODE
> will
> > > >>> be
> > > >>>>>>>      changed and the new HOSTNAME and RPC PORT
can be obtained
> > > >> and
> > > >>>> the
> > > >>>>>>>      TajoWorker can establish the new RPC connection
with the
> > > >>>>> TajoMaster.
> > > >>>>>>>
> > > >>>>>>>          To demonstrate I have created the small
Readme.txt
> file
> > > >>>>>>> on Github on how to run the example. Please read
the log
> > > >> statements
> > > >>> on
> > > >>>>> the
> > > >>>>>>> console.
> > > >>>>>>>
> > > >>>>>>>          Similar to TajoWorkerZkController we
can also
> > > >>>>>>> implement TajoClientZkController.
> > > >>>>>>>
> > > >>>>>>>          Any help or advice is appreciated.
> > > >>>>>>>
> > > >>>>>>> Thanks!
> > > >>>>>>> Warm Regards,
> > > >>>>>>> Alvin.
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> My research interests are distributed systems, parallel computing
> > and
> > > >>>> bytecode based virtual machine.
> > > >>>>
> > > >>>> My profile:
> > > >>>> http://www.linkedin.com/in/coderplay
> > > >>>> My blog:
> > > >>>> http://coderplay.javaeye.com
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> My research interests are distributed systems, parallel computing
> and
> > > >> bytecode based virtual machine.
> > > >>
> > > >> My profile:
> > > >> http://www.linkedin.com/in/coderplay
> > > >> My blog:
> > > >> http://coderplay.javaeye.com
> > > >>
> > >
> > >
> >
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message