hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: ZK rethink?
Date Tue, 07 Apr 2009 20:53:37 GMT
Thanks for the input Joey, and may I be the first to say "holy shit".

The reason their approach works is because the C API spins off OS threads
that exist outside the domain of the Java VM, which means those threads
never get paused for GC processing.

With that kind of input, we might want to consider doing what he did.  Maybe
you can donate a bit of code?

Thanks!
-ryan

On Tue, Apr 7, 2009 at 1:49 PM, Nitay <nitayj@gmail.com> wrote:

> Very interesting Joey. Thanks for replying with this information. Also,
> welcome! :).
>
> I don't quite understand why the C API with JNI fixes the problem. Did that
> substantially reduce your tiny, short lived objects to the point where the
> GC wasn't starving the ZooKeeper IO threads anymore?
>
> Perhaps my initial 10 second value was not enough. Andrew, can you try 30
> or
> 60 seconds as a test on your cluster to see if that calms things down?
>
> -n
>
> On Tue, Apr 7, 2009 at 1:43 PM, Joey Echeverria <joey42@gmail.com> wrote:
>
> > Long time lurker, first time poster.
> >
> > We've used zookeeper in a write-heavy project we've been working on
> > and experienced issues similar to what you described. After several
> > days of debugging, we discovered that our issue was garbage
> > collection. There was no way to guarantee we wouldn't have long pauses
> > especially since our environment was the worst case for garbage
> > collection, millions of tiny, short lived objects. I suspect HBase
> > sees similar work loads frequently, if it's not constantly. With
> > anything shorter than a 30 second session time out, we got session
> > expiration events extremely frequently. We needed to use 60 seconds
> > for any real confidence that an ephemeral node disappearing meant
> > something was unavailable.
> >
> > We really wanted quick recovery so we ended up writing a light-weight
> > wrapper around the C API and used swig to auto-generate a JNI
> > interface. It's not perfect, but since we switched to this method
> > we've never seen a session expiration event and ephemeral nodes only
> > disappear when there are network issues or a machine/process goes
> > down.
> >
> > I don't know if it's worth doing the same kind of thing for HBase as
> > it adds some "unnecessary" native code, but it's a solution that I
> > found works.
> >
> > On Tue, Apr 7, 2009 at 9:28 PM, Jim Kellerman (POWERSET)
> > <Jim.Kellerman@microsoft.com> wrote:
> > > There are a number of reasons why Zookeeper could receive a
> > SessionExpired
> > > event:
> > > - The process died
> > > - The machine died
> > > - The is/was a network partitioning
> > > - The network is flapping
> > >
> > > This is why the lease timeout is set to 2 minutes by default. If things
> > > haven't recovered in two minutes, we assume that the region server is
> > > dead, hung or in any event, unresponsive. Maybe we should add an API
> > > to the region server such that the Master (or Zookeeper) could call it
> > > and ask if it is still alive, before starting region server recovery
> > > (ProcessServerShutdown).
> > >
> > > ---
> > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > >
> > >
> > >> -----Original Message-----
> > >> From: Nitay [mailto:nitayj@gmail.com]
> > >> Sent: Tuesday, April 07, 2009 1:13 PM
> > >> To: hbase-dev@hadoop.apache.org; apurtell@apache.org
> > >> Subject: Re: ZK rethink?
> > >>
> > >> Hi Andrew,
> > >>
> > >> I agree with you that getting a SessionExpired is a problem for us,
> > >> and we
> > >> didn't really consider it when we initially put in the ZooKeeper
> > >> code.
> > >> However, I don't necessarily think a complete rethink is necessary.
> > >>
> > >> The main issue here is how often a SessionExpired is going to
> > >> happen, and
> > >> why it is happening that often. Most people using ZooKeeper use a
> > >> session
> > >> timeout of 2 or 3 seconds. A SessionExpired occurs when you lose
> > >> connection
> > >> to the ZooKeeper instance you were talking to and are unable to
> > >> connect to
> > >> another one within this time frame. In HBase, we use 10 seconds for
> > >> this
> > >> interval. Given that, I think we should do some recon work first to
> > >> determine what's going on. When does it happen? Why? Is the
> > >> ZooKeeper IO
> > >> thread getting starved for long periods of time? Can we prevent it?
> > >> The
> > >> ZooKeeper folks describe SessionExpired as a very, very rare event,
> > >> yet that
> > >> does not seem to be the case for us.
> > >>
> > >> Issues like HBASE-1314 are certainly a bug. If we think a node is
> > >> dead
> > >> because its ephemeral ZNode has vanished we should not try talking
> > >> to it
> > >> anymore. We cannot have a case where we both think it's dead and are
> > >> talking
> > >> to.
> > >>
> > >> If, after some investigation, we come to the conclusion that these
> > >> SessionExpired events are unavoidable things that will happen quite
> > >> frequently, then yes I think something like what you suggest is a
> > >> good idea.
> > >> But if these events only really do happen once in a blue moon as it
> > >> seems
> > >> they're supposed to, then perhaps simply internally restarting the
> > >> node in
> > >> question is not so bad?
> > >>
> > >> Within the solutions you propose I would opt for the timer option. I
> > >> don't
> > >> think that not using ephemeral nodes with watches is a good
> > >> solution. It
> > >> shifts us away from using the power that ZooKeeper provides.
> > >> Assuming at
> > >> some point ZooKeeper gets more reliable with its sessions, we will
> > >> have a
> > >> lot of code to change if we want to undo the decision.
> > >>
> > >> Regardless of what we end up going with, we need to do _something_
> > >> on the
> > >> RS/master when they get a SessionExpired, because we currently will
> > >> get
> > >> wedged. That's what I'm working on right now (HBASE-1311, HBASE-
> > >> 1312).
> > >>
> > >> Thanks for bringing this up Andrew. I'm glad we have a cluster like
> > >> yours to
> > >> bring out these sorts of problems. I look forward to further
> > >> discussion on
> > >> this topic and hearing other people's thoughts.
> > >>
> > >> Cheers,
> > >> -n
> > >>
> > >> On Tue, Apr 7, 2009 at 11:10 AM, Andrew Purtell
> > >> <apurtell@apache.org> wrote:
> > >>
> > >> >
> > >> > Hi Chad,
> > >> >
> > >> > In my testing the session expiration happens due to missed IO
> > >> > like as with ZOOKEEPER-344, which is currently open.
> > >> >
> > >> >  https://issues.apache.org/jira/browse/ZOOKEEPER-344
> > >> >
> > >> > Also a Google search for "zookeeper session expired" turns up
> > >> > some conversation already on the topic.
> > >> >
> > >> >  - Andy
> > >> >
> > >> >
> > >> > > From: Chad Walters
> > >> > > Subject: RE: ZK rethink?
> > >> > > To: "hbase-dev@hadoop.apache.org" <hbase-dev@hadoop.apache.org>
> > >> > > Date: Tuesday, April 7, 2009, 10:57 AM
> > >> > >
> > >> > > Has this been discussed at all with the ZooKeeper
> > >> > > developers?
> > >> > >
> > >> > > Chad
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: Andrew Purtell [mailto:apurtell@apache.org]
> > >> > > Sent: Tuesday, April 07, 2009 10:53 AM
> > >> > > To: hbase-dev@hadoop.apache.org
> > >> > > Subject: ZK rethink?
> > >> > >
> > >> > >
> > >> > > I think an assumption about ZK has been made that is wrong:
> > >> > > The assumption is that ZK sessions are reliable, so taking
> > >> > > immediate action from a watcher when an ephemeral node goes
> > >> > > away is safe, but ZK sessions can expire for a number of
> > >> > > reasons not related to the process holding the handle going
> > >> > > away. So serious issues like HBASE-1314 result.
> > >> > >
> > >> > > Some problems related to session expiration can be easily
> > >> > > handled by having the ZK wrapper reinitialize the ZK handle
> > >> > > and recreate ephemeral nodes when it is informed that its
> > >> > > session has expired. However the problem with watchers
> > >> > > seeing deletions and taking (inappropriate) action remains.
> > >> > > In my opinion, every place in the code where watchers on
> > >> > > znodes are used to determine the state of something needs
> > >> > > to be reworked.
> > >> > >
> > >> > > One option is to start a timer when a znode disappears and
> > >> > > watch for its reappearance while the timer is running. If
> > >> > > the timer expires without reappearance, then take action.
> > >> > >
> > >> > > Another option is to not use ephemeral nodes. Have the
> > >> > > readers discover their znodes of interest and then poll
> > >> > > them. Include timestamps in the stored data to determine
> > >> > > freshness. Declare a node expired beyond some delta between
> > >> > > last update and current time, and then take action. (The
> > >> > > poller can delete the znode also to clean up.)
> > >> > >
> > >> > >    - Andy
> > >> >
> > >> >
> > >> >
> > >> >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message