tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuhui Liu <maf...@gmail.com>
Subject Re: JIRA-704 : TajoMaster High Availability .
Date Thu, 17 Apr 2014 08:19:14 GMT
It seems ZK is based on PAXOS. The it will be much simpler. We can focus on
how to use ZK well.

Cheers,
Xuhui


On Thu, Apr 17, 2014 at 4:14 PM, Xuhui Liu <mafish@gmail.com> wrote:

> Talking about the HA of TajoMaster. Keeping consistence among primary
> master and slave masters will be a big challenge. Have we ever thought
> about the PAXOS protocol? It's designed to keep consistence in distributed
> environment.
>
> Thanks,
> Daniel
>
>
> On Wed, Apr 16, 2014 at 7:56 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
>> Hi Alvin,
>>
>> First of all, thank you Alvin for your contribution. Your proposal looks
>> nice and reasonable for me.
>>
>> BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be somewhat
>> overlapped to each other. We need to arrange the tasks to avoid duplicated
>> works.
>>
>> In my opinion, TajoMaster HA feature involves three sub features:
>>   1) Leader election of multiple TajoMasters - One of multiple TajoMasters
>> always is the leader TajoMaster.
>>   2) Service discovery of TajoClient side - TajoClient API call should be
>> resilient even though the original TajoMaster is not available.
>>   3) Cluster resource management and Catalog information that TajoMaster
>> keeps in main-memory. - the information should not be lost.
>>
>> I think that (1) and (2) are duplicated to TAJO-611 for service discovery.
>> So, it would be nice if TAJO-704 should only focus on (3). It's because
>> TAJO-611 already started few weeks ago and TAJO-704 may be the relatively
>> earlier stage. *Instead, you can continue the work with Xuhui and Min.*
>> Someone can divide the service discovery issue into more subtasks.
>>
>> In addition, I'd like to more discuss (3). Currently, a running TajoMaster
>> keeps two information: cluster resource information of all workers and
>> catalog information. In order to guarantee the HA of the data, TajoMaster
>> should either persistently materialize them or consistently synchronize
>> them across multiple TajoMasters. BTW, we will replace the resource
>> management feature of TajoMaster into a decentralized manner in new
>> scheduler issue. As a result, I think that TajoMaster HA needs to focus on
>> only the high availability of catalog information. The HA of catalog can
>> be
>> easily achieved by database replication or we can make our own module for
>> it. In my view, I prefer the former.
>>
>> Hi Xuhui and Min,
>>
>> Could you share the brief progress of service discovery issue? If so, we
>> can easily figure out how we start the service discovery together.
>>
>> Warm regards,
>> Hyunsik
>>
>>
>>
>> On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <coderplay@gmail.com> wrote:
>>
>> > Actually, we are not only thinking about the HA, but also service
>> discovery
>> > when the future tajo scheduler would rely on.  Tajo scheduler can get
>> all
>> > the active workers from that service.
>> >
>> >
>> > Regards,
>> > Min
>> >
>> >
>> > On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <mafish@gmail.com> wrote:
>> >
>> > > Hi Alvin,
>> > >
>> > > TAJO-611 will introduce Curator as a service discovery service to Tajo
>> > and
>> > > Curator is based on ZK. Maybe we can work together.
>> > >
>> > > Thanks,
>> > > Xuhui
>> > >
>> > >
>> > > On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <coderplay@gmail.com>
>> wrote:
>> > >
>> > > > HI Alvin,
>> > > >
>> > > > I think this jira has somewhat overlap with TAJO-611,  can you have
>> > some
>> > > > cooperation?
>> > > >
>> > > > Thanks,
>> > > > Min
>> > > >
>> > > >
>> > > > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra <
>> > henry.saputra@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Jaehwa, I think we should think about pluggable mechanism that
>> would
>> > > > > allow some kind distributed system like ZK to be used if wanted.
>> > > > >
>> > > > > - Henry
>> > > > >
>> > > > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <blrunner@apache.org
>> >
>> > > > wrote:
>> > > > > > Hi, Alvin
>> > > > > >
>> > > > > > I'm sorry for late response, and thank you very much for
your
>> > > > > contribution.
>> > > > > > I agree with your opinion for zookeeper. But, zookeeper
>> requires an
>> > > > > > additional dependency that someone does not want.
>> > > > > >
>> > > > > > I'd like to suggest adding an abstraction layer for handling
>> > > TajoMaster
>> > > > > HA.
>> > > > > > When I had created TAJO-740, I wished that TajoMaster HA
would
>> > have a
>> > > > > > generic interface and a basic implementation using HDFS.
Next,
>> your
>> > > > > > proposed zookeeper implementation will be added there. It
will
>> > allow
>> > > > > users
>> > > > > > to choice their desired implementation according to their
>> > > environments.
>> > > > > >
>> > > > > > In addition, I'd like to propose that TajoMaster embeds
the HA
>> > > module,
>> > > > > and
>> > > > > > it would be great if HA works well by launching a backup
>> > TajoMaster.
>> > > > > > Deploying additional process besides TajoMaster and TajoWorker
>> > > > processes
>> > > > > > may give more burden to users.
>> > > > > >
>> > > > > > *Cheers*
>> > > > > > *Jaehwa*
>> > > > > >
>> > > > > >
>> > > > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <jihoonson@apache.org>:
>> > > > > >
>> > > > > >> Hi Alvin.
>> > > > > >> Thanks for your suggestion.
>> > > > > >>
>> > > > > >> In overall, your suggestion looks very reasonable to
me!
>> > > > > >> I'll check the POC.
>> > > > > >>
>> > > > > >> Many thanks,
>> > > > > >> Jihoon
>> > > > > >> Hi All ,
>> > > > > >>             After doing lot of research in my opinion
we should
>> > > > utilize
>> > > > > >> zookeeper for Tajo Master HA.I have created a small
POC and
>> shared
>> > > it
>> > > > > on my
>> > > > > >> Github repository ( git@github.com:
>> > alvinhenrick/zooKeeper-poc.git).
>> > > > > >>
>> > > > > >>             Just to make things little bit easier and
>> > maintainable I
>> > > > am
>> > > > > >> utilizing Apache Curator the Fluent Zookeeper Client
API
>> >  developed
>> > > at
>> > > > > >> Netflix and is now part of an  apache open source project.
>> > > > > >>
>> > > > > >>             I have attached the diagram to convey my
message to
>> > the
>> > > > team
>> > > > > >> members.Will upload it to JIRA once everyone agree with
the
>> > proposed
>> > > > > >> solution.
>> > > > > >>
>> > > > > >>             Here is the flow going to look like.
>> > > > > >>
>> > > > > >>             TajoMasterZkController   ==>
>> > > > > >>
>> > > > > >>
>> > > > > >>    1. This component  will start and connect to zookeeper
>> quorum
>> > and
>> > > > > fight
>> > > > > >>       ( :) ) to obtain the latch / lock to become the
master .
>> > > > > >>       2. Once the lock is obtained the Apache Curator
API will
>> > > invoke
>> > > > > >>       takeLeadership () method at this time will start
the
>> > > TajoMaster.
>> > > > > >>       3. As long as the TajoMaster is running the Controller
>> will
>> > > keep
>> > > > > the
>> > > > > >>       lock and update the meta data on zookeeper server
with
>> the
>> > > > > >> HOSTNAME and RPC
>> > > > > >>       PORT.
>> > > > > >>       4. The other participant will keep waiting for
the latch/
>> > lock
>> > > > to
>> > > > > be
>> > > > > >>       released by zookeeper to obtain the leadership.
>> > > > > >>       5. The advantage is we can have as many Tajo Master's
as
>> we
>> > > > wan't
>> > > > > but
>> > > > > >>       only one can be the leader and will consume the
resources
>> > only
>> > > > > after
>> > > > > >>       obtaining the latch/lock.
>> > > > > >>
>> > > > > >>
>> > > > > >>            TajoWorkerZkController ==>
>> > > > > >>
>> > > > > >>    1. This component  will start and connect to zookeeper
(will
>> > > create
>> > > > > >>       EPHEMERAL ZNODE) and wait for the events from
zookeeper.
>> > > > > >>       2. The first listener will listener for successful
>> > > registration.
>> > > > > >>       3. The second listener on master node will listen
for any
>> > > > >  changes to
>> > > > > >>       the master node received from zookeeper server.
>> > > > > >>       4.  If the failover occurs the data on the master
ZNODE
>> will
>> > > be
>> > > > > >>       changed and the new HOSTNAME and RPC PORT can
be obtained
>> > and
>> > > > the
>> > > > > >>       TajoWorker can establish the new RPC connection
with the
>> > > > > TajoMaster.
>> > > > > >>
>> > > > > >>           To demonstrate I have created the small Readme.txt
>> file
>> > > > > >> on Github on how to run the example. Please read the
log
>> > statements
>> > > on
>> > > > > the
>> > > > > >> console.
>> > > > > >>
>> > > > > >>           Similar to TajoWorkerZkController we can also
>> > > > > >> implement TajoClientZkController.
>> > > > > >>
>> > > > > >>           Any help or advice is appreciated.
>> > > > > >>
>> > > > > >> Thanks!
>> > > > > >> Warm Regards,
>> > > > > >> Alvin.
>> > > > > >>
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > My research interests are distributed systems, parallel computing
>> and
>> > > > bytecode based virtual machine.
>> > > >
>> > > > My profile:
>> > > > http://www.linkedin.com/in/coderplay
>> > > > My blog:
>> > > > http://coderplay.javaeye.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > My research interests are distributed systems, parallel computing and
>> > bytecode based virtual machine.
>> >
>> > My profile:
>> > http://www.linkedin.com/in/coderplay
>> > My blog:
>> > http://coderplay.javaeye.com
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message