zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: puzzling BadVersionException
Date Tue, 11 Oct 2011 01:21:38 GMT
Why do you get the version in the first place without getting the contents?

If you don't have the contents, what is the point of enforcing a version.

On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <ishaaq@gmail.com> wrote:

> Thanks Mahadev,
> Yup, I am aware of the fact that 2 is a particularly bad number for cluster
> size and hopefully we should fix that soon, I was just hoping that for some
> reason that was why the problem is occurring - my conjecture was, for e.g.
> if the two zk servers disagree about the version there is no way to decide
> who is correct without a third tie-breaker server.
>
> But, if you say that is not the case, then I need to keep looking (sigh).
>
> I am pretty sure that only one thread is touching that znode. We put in
> some
> trace logging to try and pinpoint the problem and noticed that every time
> we
> get the BadVersionException the actual version on the znode is one more
> than
> what we expected it to be based on the previous "exists()" call.
>
> As I said, this code gets called once every 2 seconds (or thereabouts). It
> seems to fail with a BadVersionException about 3 times an hour (on
> average).
>
> By the way, not sure if it is relevant, but the reason we are using 2 nodes
> in the cluster and the reason why their version is 3.2.2 is because they
> are
> the ZKs that come embedded inside HBase (we're running 2 Hbase
> regionservers) - I've been meaning to pull them out and run them standalone
> but just haven't got around to it (yet).
>
> Ishaaq
>
> On 10 October 2011 17:35, Mahadev Konar <mahadev@hortonworks.com> wrote:
>
> > Ishaaq,
> >  2 ZK servers is definitely not the right number for running a ZK
> > service but its no reason to get a Badversion exception because of
> > that. For more information on the size of the ZK ensemble take a look
> > at:
> >
> > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
> >
> > As for the version on the znode, can you try reading the version when
> > you get a setData/BadException?
> >
> > Also, is there any chance of a delete on the znode that removes it and
> > another create happens for the same path?
> >
> > I dont think we have seen this version issue in the releases, so I'd
> > be inclined to say that there could be something in the code thats
> > making some changes to the znode before you set the data.
> >
> > Hope that helps
> > thanks
> > mahadev
> >
> > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <ishaaq@gmail.com> wrote:
> > > Hi all,
> > >
> > > We're seeing a puzzling error. Here's the scenario:
> > >
> > > 1. We have a single thread that wakes up every two seconds (give or
> take)
> > > and does some work
> > > 2. As part of that work it updates a node on ZK. When it does this it
> > first
> > > gets the Stat of the existing node and uses the version retrieved from
> it
> > to
> > > update the value.
> > > 3. There are no other processes updating the node
> > >
> > > The code goes something like this:
> > >  final Stat stat = zooKeeper.exists(path, false);
> > > // do some other work here to create the path if it does not exist -
> this
> > > code only ever gets called once
> > >  zooKeeper.setData(path, value, stat.getVersion());
> > >
> > > What we're seeing is that every so often (once every 5 minutes or so?)
> is
> > > that that setData() call fails with a BadVersionException. This is very
> > > unexpected because, as I mentioned previously, this thread is the sole
> > > updater of that node.
> > >
> > > One possibility I am considering is that we are using the wrong number
> of
> > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst
> number
> > of
> > > nodes possible for ZK as there is no way to resolve a disagreement.
> > >
> > > Another possibility is that we are using an old version of ZK (3.2.2),
> > > perhaps there is a known bug with it? Though I see nothing related to
> > this
> > > in the release logs for subsequent versions.
> > >
> > > Thoughts/suggestions?
> > >
> > > Thanks,
> > > Ishaaq
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message