tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Zhou <coderp...@gmail.com>
Subject Re: JIRA-704 : TajoMaster High Availability .
Date Wed, 16 Apr 2014 06:36:03 GMT
Actually, we are not only thinking about the HA, but also service discovery
when the future tajo scheduler would rely on.  Tajo scheduler can get all
the active workers from that service.


Regards,
Min


On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <mafish@gmail.com> wrote:

> Hi Alvin,
>
> TAJO-611 will introduce Curator as a service discovery service to Tajo and
> Curator is based on ZK. Maybe we can work together.
>
> Thanks,
> Xuhui
>
>
> On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <coderplay@gmail.com> wrote:
>
> > HI Alvin,
> >
> > I think this jira has somewhat overlap with TAJO-611,  can you have some
> > cooperation?
> >
> > Thanks,
> > Min
> >
> >
> > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra <henry.saputra@gmail.com
> > >wrote:
> >
> > > Jaehwa, I think we should think about pluggable mechanism that would
> > > allow some kind distributed system like ZK to be used if wanted.
> > >
> > > - Henry
> > >
> > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <blrunner@apache.org>
> > wrote:
> > > > Hi, Alvin
> > > >
> > > > I'm sorry for late response, and thank you very much for your
> > > contribution.
> > > > I agree with your opinion for zookeeper. But, zookeeper requires an
> > > > additional dependency that someone does not want.
> > > >
> > > > I'd like to suggest adding an abstraction layer for handling
> TajoMaster
> > > HA.
> > > > When I had created TAJO-740, I wished that TajoMaster HA would have a
> > > > generic interface and a basic implementation using HDFS. Next, your
> > > > proposed zookeeper implementation will be added there. It will allow
> > > users
> > > > to choice their desired implementation according to their
> environments.
> > > >
> > > > In addition, I'd like to propose that TajoMaster embeds the HA
> module,
> > > and
> > > > it would be great if HA works well by launching a backup TajoMaster.
> > > > Deploying additional process besides TajoMaster and TajoWorker
> > processes
> > > > may give more burden to users.
> > > >
> > > > *Cheers*
> > > > *Jaehwa*
> > > >
> > > >
> > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <jihoonson@apache.org>:
> > > >
> > > >> Hi Alvin.
> > > >> Thanks for your suggestion.
> > > >>
> > > >> In overall, your suggestion looks very reasonable to me!
> > > >> I'll check the POC.
> > > >>
> > > >> Many thanks,
> > > >> Jihoon
> > > >> Hi All ,
> > > >>             After doing lot of research in my opinion we should
> > utilize
> > > >> zookeeper for Tajo Master HA.I have created a small POC and shared
> it
> > > on my
> > > >> Github repository ( git@github.com:alvinhenrick/zooKeeper-poc.git).
> > > >>
> > > >>             Just to make things little bit easier and maintainable
I
> > am
> > > >> utilizing Apache Curator the Fluent Zookeeper Client API  developed
> at
> > > >> Netflix and is now part of an  apache open source project.
> > > >>
> > > >>             I have attached the diagram to convey my message to the
> > team
> > > >> members.Will upload it to JIRA once everyone agree with the proposed
> > > >> solution.
> > > >>
> > > >>             Here is the flow going to look like.
> > > >>
> > > >>             TajoMasterZkController   ==>
> > > >>
> > > >>
> > > >>    1. This component  will start and connect to zookeeper quorum and
> > > fight
> > > >>       ( :) ) to obtain the latch / lock to become the master .
> > > >>       2. Once the lock is obtained the Apache Curator API will
> invoke
> > > >>       takeLeadership () method at this time will start the
> TajoMaster.
> > > >>       3. As long as the TajoMaster is running the Controller will
> keep
> > > the
> > > >>       lock and update the meta data on zookeeper server with the
> > > >> HOSTNAME and RPC
> > > >>       PORT.
> > > >>       4. The other participant will keep waiting for the latch/ lock
> > to
> > > be
> > > >>       released by zookeeper to obtain the leadership.
> > > >>       5. The advantage is we can have as many Tajo Master's as we
> > wan't
> > > but
> > > >>       only one can be the leader and will consume the resources only
> > > after
> > > >>       obtaining the latch/lock.
> > > >>
> > > >>
> > > >>            TajoWorkerZkController ==>
> > > >>
> > > >>    1. This component  will start and connect to zookeeper (will
> create
> > > >>       EPHEMERAL ZNODE) and wait for the events from zookeeper.
> > > >>       2. The first listener will listener for successful
> registration.
> > > >>       3. The second listener on master node will listen for any
> > >  changes to
> > > >>       the master node received from zookeeper server.
> > > >>       4.  If the failover occurs the data on the master ZNODE will
> be
> > > >>       changed and the new HOSTNAME and RPC PORT can be obtained and
> > the
> > > >>       TajoWorker can establish the new RPC connection with the
> > > TajoMaster.
> > > >>
> > > >>           To demonstrate I have created the small Readme.txt file
> > > >> on Github on how to run the example. Please read the log statements
> on
> > > the
> > > >> console.
> > > >>
> > > >>           Similar to TajoWorkerZkController we can also
> > > >> implement TajoClientZkController.
> > > >>
> > > >>           Any help or advice is appreciated.
> > > >>
> > > >> Thanks!
> > > >> Warm Regards,
> > > >> Alvin.
> > > >>
> > >
> >
> >
> >
> > --
> > My research interests are distributed systems, parallel computing and
> > bytecode based virtual machine.
> >
> > My profile:
> > http://www.linkedin.com/in/coderplay
> > My blog:
> > http://coderplay.javaeye.com
> >
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message