zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: puzzling BadVersionException
Date Tue, 11 Oct 2011 12:12:25 GMT
Sounds like you may want to look into the multi operation if you have many
inputs being processed to form a single output.

On Tue, Oct 11, 2011 at 5:55 AM, Ishaaq Chandy <ishaaq@gmail.com> wrote:

> Ok, false alarm - the problem was a mis-configuration in our code that was
> causing multiple processes to update that znode whereas only one should
> have.
>
> Apologies for wasting your time.
>
> Ishaaq
>
> On 11 October 2011 13:09, Ishaaq Chandy <ishaaq@gmail.com> wrote:
>
> > Technically we don't need the contents as we're going to overwrite it
> > anyway, we're just asserting the fact that we're the only one writing to
> > that node.
> >
> > Was just checking if it is a known issue - clearly not, so I'll continue
> > investigating our code.
> >
> > Thanks,
> > Ishaaq
> >
> >
> > On 11 October 2011 12:21, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> >> Why do you get the version in the first place without getting the
> >> contents?
> >>
> >> If you don't have the contents, what is the point of enforcing a
> version.
> >>
> >> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <ishaaq@gmail.com>
> wrote:
> >>
> >> > Thanks Mahadev,
> >> > Yup, I am aware of the fact that 2 is a particularly bad number for
> >> cluster
> >> > size and hopefully we should fix that soon, I was just hoping that for
> >> some
> >> > reason that was why the problem is occurring - my conjecture was, for
> >> e.g.
> >> > if the two zk servers disagree about the version there is no way to
> >> decide
> >> > who is correct without a third tie-breaker server.
> >> >
> >> > But, if you say that is not the case, then I need to keep looking
> >> (sigh).
> >> >
> >> > I am pretty sure that only one thread is touching that znode. We put
> in
> >> > some
> >> > trace logging to try and pinpoint the problem and noticed that every
> >> time
> >> > we
> >> > get the BadVersionException the actual version on the znode is one
> more
> >> > than
> >> > what we expected it to be based on the previous "exists()" call.
> >> >
> >> > As I said, this code gets called once every 2 seconds (or
> thereabouts).
> >> It
> >> > seems to fail with a BadVersionException about 3 times an hour (on
> >> > average).
> >> >
> >> > By the way, not sure if it is relevant, but the reason we are using 2
> >> nodes
> >> > in the cluster and the reason why their version is 3.2.2 is because
> they
> >> > are
> >> > the ZKs that come embedded inside HBase (we're running 2 Hbase
> >> > regionservers) - I've been meaning to pull them out and run them
> >> standalone
> >> > but just haven't got around to it (yet).
> >> >
> >> > Ishaaq
> >> >
> >> > On 10 October 2011 17:35, Mahadev Konar <mahadev@hortonworks.com>
> >> wrote:
> >> >
> >> > > Ishaaq,
> >> > >  2 ZK servers is definitely not the right number for running a ZK
> >> > > service but its no reason to get a Badversion exception because of
> >> > > that. For more information on the size of the ZK ensemble take a
> look
> >> > > at:
> >> > >
> >> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
> >> > >
> >> > > As for the version on the znode, can you try reading the version
> when
> >> > > you get a setData/BadException?
> >> > >
> >> > > Also, is there any chance of a delete on the znode that removes it
> and
> >> > > another create happens for the same path?
> >> > >
> >> > > I dont think we have seen this version issue in the releases, so I'd
> >> > > be inclined to say that there could be something in the code thats
> >> > > making some changes to the znode before you set the data.
> >> > >
> >> > > Hope that helps
> >> > > thanks
> >> > > mahadev
> >> > >
> >> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <ishaaq@gmail.com>
> >> wrote:
> >> > > > Hi all,
> >> > > >
> >> > > > We're seeing a puzzling error. Here's the scenario:
> >> > > >
> >> > > > 1. We have a single thread that wakes up every two seconds (give
> or
> >> > take)
> >> > > > and does some work
> >> > > > 2. As part of that work it updates a node on ZK. When it does
this
> >> it
> >> > > first
> >> > > > gets the Stat of the existing node and uses the version retrieved
> >> from
> >> > it
> >> > > to
> >> > > > update the value.
> >> > > > 3. There are no other processes updating the node
> >> > > >
> >> > > > The code goes something like this:
> >> > > >  final Stat stat = zooKeeper.exists(path, false);
> >> > > > // do some other work here to create the path if it does not
exist
> -
> >> > this
> >> > > > code only ever gets called once
> >> > > >  zooKeeper.setData(path, value, stat.getVersion());
> >> > > >
> >> > > > What we're seeing is that every so often (once every 5 minutes
or
> >> so?)
> >> > is
> >> > > > that that setData() call fails with a BadVersionException. This
is
> >> very
> >> > > > unexpected because, as I mentioned previously, this thread is
the
> >> sole
> >> > > > updater of that node.
> >> > > >
> >> > > > One possibility I am considering is that we are using the wrong
> >> number
> >> > of
> >> > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the
worst
> >> > number
> >> > > of
> >> > > > nodes possible for ZK as there is no way to resolve a
> disagreement.
> >> > > >
> >> > > > Another possibility is that we are using an old version of ZK
> >> (3.2.2),
> >> > > > perhaps there is a known bug with it? Though I see nothing related
> >> to
> >> > > this
> >> > > > in the release logs for subsequent versions.
> >> > > >
> >> > > > Thoughts/suggestions?
> >> > > >
> >> > > > Thanks,
> >> > > > Ishaaq
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message