tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: JIRA-704 : TajoMaster High Availability .
Date Thu, 17 Apr 2014 08:32:55 GMT
Xuhui,

ZK is not base on PAXOS, instead, it use Zab(ZooKeeper Atomic Broadcast),
which is different from PAXOS



On Thu, Apr 17, 2014 at 4:19 PM, Xuhui Liu <mafish@gmail.com> wrote:

> It seems ZK is based on PAXOS. The it will be much simpler. We can focus on
> how to use ZK well.
>
> Cheers,
> Xuhui
>
>
> On Thu, Apr 17, 2014 at 4:14 PM, Xuhui Liu <mafish@gmail.com> wrote:
>
> > Talking about the HA of TajoMaster. Keeping consistence among primary
> > master and slave masters will be a big challenge. Have we ever thought
> > about the PAXOS protocol? It's designed to keep consistence in
> distributed
> > environment.
> >
> > Thanks,
> > Daniel
> >
> >
> > On Wed, Apr 16, 2014 at 7:56 PM, Hyunsik Choi <hyunsik@apache.org>
> wrote:
> >
> >> Hi Alvin,
> >>
> >> First of all, thank you Alvin for your contribution. Your proposal looks
> >> nice and reasonable for me.
> >>
> >> BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be somewhat
> >> overlapped to each other. We need to arrange the tasks to avoid
> duplicated
> >> works.
> >>
> >> In my opinion, TajoMaster HA feature involves three sub features:
> >>   1) Leader election of multiple TajoMasters - One of multiple
> TajoMasters
> >> always is the leader TajoMaster.
> >>   2) Service discovery of TajoClient side - TajoClient API call should
> be
> >> resilient even though the original TajoMaster is not available.
> >>   3) Cluster resource management and Catalog information that TajoMaster
> >> keeps in main-memory. - the information should not be lost.
> >>
> >> I think that (1) and (2) are duplicated to TAJO-611 for service
> discovery.
> >> So, it would be nice if TAJO-704 should only focus on (3). It's because
> >> TAJO-611 already started few weeks ago and TAJO-704 may be the
> relatively
> >> earlier stage. *Instead, you can continue the work with Xuhui and Min.*
> >> Someone can divide the service discovery issue into more subtasks.
> >>
> >> In addition, I'd like to more discuss (3). Currently, a running
> TajoMaster
> >> keeps two information: cluster resource information of all workers and
> >> catalog information. In order to guarantee the HA of the data,
> TajoMaster
> >> should either persistently materialize them or consistently synchronize
> >> them across multiple TajoMasters. BTW, we will replace the resource
> >> management feature of TajoMaster into a decentralized manner in new
> >> scheduler issue. As a result, I think that TajoMaster HA needs to focus
> on
> >> only the high availability of catalog information. The HA of catalog can
> >> be
> >> easily achieved by database replication or we can make our own module
> for
> >> it. In my view, I prefer the former.
> >>
> >> Hi Xuhui and Min,
> >>
> >> Could you share the brief progress of service discovery issue? If so, we
> >> can easily figure out how we start the service discovery together.
> >>
> >> Warm regards,
> >> Hyunsik
> >>
> >>
> >>
> >> On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <coderplay@gmail.com> wrote:
> >>
> >> > Actually, we are not only thinking about the HA, but also service
> >> discovery
> >> > when the future tajo scheduler would rely on.  Tajo scheduler can get
> >> all
> >> > the active workers from that service.
> >> >
> >> >
> >> > Regards,
> >> > Min
> >> >
> >> >
> >> > On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <mafish@gmail.com> wrote:
> >> >
> >> > > Hi Alvin,
> >> > >
> >> > > TAJO-611 will introduce Curator as a service discovery service to
> Tajo
> >> > and
> >> > > Curator is based on ZK. Maybe we can work together.
> >> > >
> >> > > Thanks,
> >> > > Xuhui
> >> > >
> >> > >
> >> > > On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <coderplay@gmail.com>
> >> wrote:
> >> > >
> >> > > > HI Alvin,
> >> > > >
> >> > > > I think this jira has somewhat overlap with TAJO-611,  can you
> have
> >> > some
> >> > > > cooperation?
> >> > > >
> >> > > > Thanks,
> >> > > > Min
> >> > > >
> >> > > >
> >> > > > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra <
> >> > henry.saputra@gmail.com
> >> > > > >wrote:
> >> > > >
> >> > > > > Jaehwa, I think we should think about pluggable mechanism
that
> >> would
> >> > > > > allow some kind distributed system like ZK to be used if
wanted.
> >> > > > >
> >> > > > > - Henry
> >> > > > >
> >> > > > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <
> blrunner@apache.org
> >> >
> >> > > > wrote:
> >> > > > > > Hi, Alvin
> >> > > > > >
> >> > > > > > I'm sorry for late response, and thank you very much
for your
> >> > > > > contribution.
> >> > > > > > I agree with your opinion for zookeeper. But, zookeeper
> >> requires an
> >> > > > > > additional dependency that someone does not want.
> >> > > > > >
> >> > > > > > I'd like to suggest adding an abstraction layer for
handling
> >> > > TajoMaster
> >> > > > > HA.
> >> > > > > > When I had created TAJO-740, I wished that TajoMaster
HA would
> >> > have a
> >> > > > > > generic interface and a basic implementation using
HDFS. Next,
> >> your
> >> > > > > > proposed zookeeper implementation will be added there.
It will
> >> > allow
> >> > > > > users
> >> > > > > > to choice their desired implementation according to
their
> >> > > environments.
> >> > > > > >
> >> > > > > > In addition, I'd like to propose that TajoMaster embeds
the HA
> >> > > module,
> >> > > > > and
> >> > > > > > it would be great if HA works well by launching a backup
> >> > TajoMaster.
> >> > > > > > Deploying additional process besides TajoMaster and
TajoWorker
> >> > > > processes
> >> > > > > > may give more burden to users.
> >> > > > > >
> >> > > > > > *Cheers*
> >> > > > > > *Jaehwa*
> >> > > > > >
> >> > > > > >
> >> > > > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <jihoonson@apache.org>:
> >> > > > > >
> >> > > > > >> Hi Alvin.
> >> > > > > >> Thanks for your suggestion.
> >> > > > > >>
> >> > > > > >> In overall, your suggestion looks very reasonable
to me!
> >> > > > > >> I'll check the POC.
> >> > > > > >>
> >> > > > > >> Many thanks,
> >> > > > > >> Jihoon
> >> > > > > >> Hi All ,
> >> > > > > >>             After doing lot of research in my opinion
we
> should
> >> > > > utilize
> >> > > > > >> zookeeper for Tajo Master HA.I have created a small
POC and
> >> shared
> >> > > it
> >> > > > > on my
> >> > > > > >> Github repository ( git@github.com:
> >> > alvinhenrick/zooKeeper-poc.git).
> >> > > > > >>
> >> > > > > >>             Just to make things little bit easier
and
> >> > maintainable I
> >> > > > am
> >> > > > > >> utilizing Apache Curator the Fluent Zookeeper Client
API
> >> >  developed
> >> > > at
> >> > > > > >> Netflix and is now part of an  apache open source
project.
> >> > > > > >>
> >> > > > > >>             I have attached the diagram to convey
my message
> to
> >> > the
> >> > > > team
> >> > > > > >> members.Will upload it to JIRA once everyone agree
with the
> >> > proposed
> >> > > > > >> solution.
> >> > > > > >>
> >> > > > > >>             Here is the flow going to look like.
> >> > > > > >>
> >> > > > > >>             TajoMasterZkController   ==>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>    1. This component  will start and connect to
zookeeper
> >> quorum
> >> > and
> >> > > > > fight
> >> > > > > >>       ( :) ) to obtain the latch / lock to become
the master
> .
> >> > > > > >>       2. Once the lock is obtained the Apache Curator
API
> will
> >> > > invoke
> >> > > > > >>       takeLeadership () method at this time will
start the
> >> > > TajoMaster.
> >> > > > > >>       3. As long as the TajoMaster is running the
Controller
> >> will
> >> > > keep
> >> > > > > the
> >> > > > > >>       lock and update the meta data on zookeeper
server with
> >> the
> >> > > > > >> HOSTNAME and RPC
> >> > > > > >>       PORT.
> >> > > > > >>       4. The other participant will keep waiting
for the
> latch/
> >> > lock
> >> > > > to
> >> > > > > be
> >> > > > > >>       released by zookeeper to obtain the leadership.
> >> > > > > >>       5. The advantage is we can have as many Tajo
Master's
> as
> >> we
> >> > > > wan't
> >> > > > > but
> >> > > > > >>       only one can be the leader and will consume
the
> resources
> >> > only
> >> > > > > after
> >> > > > > >>       obtaining the latch/lock.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>            TajoWorkerZkController ==>
> >> > > > > >>
> >> > > > > >>    1. This component  will start and connect to
zookeeper
> (will
> >> > > create
> >> > > > > >>       EPHEMERAL ZNODE) and wait for the events
from
> zookeeper.
> >> > > > > >>       2. The first listener will listener for successful
> >> > > registration.
> >> > > > > >>       3. The second listener on master node will
listen for
> any
> >> > > > >  changes to
> >> > > > > >>       the master node received from zookeeper server.
> >> > > > > >>       4.  If the failover occurs the data on the
master ZNODE
> >> will
> >> > > be
> >> > > > > >>       changed and the new HOSTNAME and RPC PORT
can be
> obtained
> >> > and
> >> > > > the
> >> > > > > >>       TajoWorker can establish the new RPC connection
with
> the
> >> > > > > TajoMaster.
> >> > > > > >>
> >> > > > > >>           To demonstrate I have created the small
Readme.txt
> >> file
> >> > > > > >> on Github on how to run the example. Please read
the log
> >> > statements
> >> > > on
> >> > > > > the
> >> > > > > >> console.
> >> > > > > >>
> >> > > > > >>           Similar to TajoWorkerZkController we
can also
> >> > > > > >> implement TajoClientZkController.
> >> > > > > >>
> >> > > > > >>           Any help or advice is appreciated.
> >> > > > > >>
> >> > > > > >> Thanks!
> >> > > > > >> Warm Regards,
> >> > > > > >> Alvin.
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > My research interests are distributed systems, parallel computing
> >> and
> >> > > > bytecode based virtual machine.
> >> > > >
> >> > > > My profile:
> >> > > > http://www.linkedin.com/in/coderplay
> >> > > > My blog:
> >> > > > http://coderplay.javaeye.com
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > My research interests are distributed systems, parallel computing and
> >> > bytecode based virtual machine.
> >> >
> >> > My profile:
> >> > http://www.linkedin.com/in/coderplay
> >> > My blog:
> >> > http://coderplay.javaeye.com
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message