zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Markic <Dejan.Mar...@mobik-ics.com>
Subject A few ZooKeeper questions
Date Tue, 02 Dec 2014 22:22:38 GMT
Hello all!

Sorry for long message. But, here goes ...

We've started using ZooKeeper a few weeks ago, mostly for distributed locking. We have some
performance issues, so I'll try to explain as much as possible about our configuration and

For each "service", for which we need ZooKeeper, we're currently running quorum with 3 servers.
We are running version 3.3.5 on Debian 7.7. We are using MT C API using nonasync functions.
All servers have the following configuration:

On all servers, /var/lib/zookeeper directory is acctualy a 2GB tmpfs "partition".
We are running zkPurgeTxnLog.sh script every 2 minutes.
We have 6 CPU's on each server with 6GB of ram, but there's ussualy atleast one more service
running on it (eg Radius).

So, let me try to explain in which way we use the zookeeper.
We have Radius server(s) receiving around 300 requests per second (authentication and accounting).
We need to maintain all sessions states (start, stop update) (and sessions come and go very
NAS(es) can send many requests for same session in unordered fashion (eg, auth request comes
before acct Start, and Stop comes before Update, etc). 
So for each request, we request a lock from ZooKeeper. We use a bit modified recipe as found
at https://zookeeper.apache.org/doc/r3.1.2/recipes.html.
The recipe we're currently using is far from perfect as we tried to implement a timeout for
locking. So basically what we do is:

 - create ephemeral node /SESSIONID/_xlock/lock-
 - get children nodes in /SESSIONID/_xlock/
 - If we are the lowest sequence, we got the lock
 - If not, we wait for 10ms and call get children again untill we get the lock, or timeout
expires (we ussualy wait 1 second)

After we get the lock, we write/read some stuff relevant for session in /SESSIONID/nodeName.
Data is ussualy up to 20 bytes atmost. We write/read up to 3 nodes for each session.
We then insert/update session in our MySQL servers and unlock the lock in ZooKeeper.

Ussualy it takes around 30-50ms to obtain the lock from ZooKeeper. But it does happen, that
timeout of 1 second occurs. It seems that sometimes ZooKeeper is performing something and
is busy and response times can go way up.

Since there's a lot of nodes (/SESSIONID) left after the sessions finish, we created a script,
that removes all sessions that were last modified 900 seconds ago. So this script goes through
all nodes (/SESSIONID) and checks their children last modification time. If none of the children
were changed in last 900 seconds, we remove the whole SESSIONID node and its children.

- Do you think, we could go without that stupid 10ms wait in lock recipe, and just relied
on zookeeper that exists() with watch will return in timely fashion? Is there little or no
possibilty that one session would get stuck/deadlocked?
- If clients requesting a lock in /SESSIONID/_xlock/lock- connect to different servers in
quorum, is there a need to perform a sync before checking if we're the node with minimal sequence
- Would adding more servers to quorum increase the performance time? Would performance be
more constant that way?
- Are we performing too many writes to zookeeper?
- If we have 300k child nodes (eg 300k /SESSIONID nodes) would that be a performace issue?
- When we make the lock path, we use recursive create, which basically does a create() for
each node, and if ZNODEEXISTS is returned, we simply go on. I see a *lot* of these messages
in log: 
[ProcessThread:-1:PrepRequestProcessor@419] - Got user-level KeeperException when processing
sessionid:0x34a0cbf41360018 type:create cxid:0x548a2589 zxid:0xfffffffffffffffe txntype:unknown
reqpath:n/a Error Path:/SESSIONID/_xlock Error:KeeperErrorCode = NodeExists for /SESSIONID/_xlock
 Would it be better if exists() would be called for each before creating if needed? And, do
we need to sync() before checking with exists()?
- Would upgrading to newer version improve performance?

Any input would be greatly appreciated!

Kind regards,
Dejan Markic

View raw message