hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Łukasz Osipiuk <luk...@osipiuk.net>
Subject c client - problem with failover
Date Fri, 28 Aug 2009 15:24:01 GMT
Hi!

I my name is Łukasz Osipiuk. I am working for one of major Polish
Internet companies.
In one of our projects we are intensively using Zookeeper as
distributed locking system. We implemented slightly modified locking
algorithm
from zookeeper docs page.
(http://hadoop.apache.org/zookeeper/docs/current/recipes.html#sc_recipes_Locks)

Unfortunately we experience some problems with deadlocks. As I
examined the problem it appears that either we misuse zookeeper in
some way
or it is buggy. Our app is written in C++ and we are using
zookeeper_mt C library.

Tests below are done using server version 3.1.1 and client library
version 3.2.0, but on production we have both client and server in
3.1.1. and experience same problems.

I attach the code snippet i wrote to isolate our problems. As I run it
and while it is running randomly kill zookeeper nodes I (from time to
time) get one of following behaviors:

1. the zoo_create() call returns error but still node is created in zookeeper.
    If such problem happens in locking protocol we get a hanging lock
without owner which will never disapear. Closing client zookeeper
session is
    needed to remove such hanging ephemeral node.

2. application thread just hangs. From what i observed in gdb it is
waiting for synchronous operation completion (function
wait_sync_completion)

Is there a way to avoid this problems? Are we doing something wrong or
should we create a bug report?
Is anyone of you using zookeeper as distributed locking service with
more success?

Help is really appreciate.

PS. to compile code snippet use:
g++ credel.cc -o credel -pedantic -lzookeeper_mt

-- 
Łukasz Osipiuk
mailto:lukasz@osipiuk.net

Mime
View raw message