zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mekaraj, Prashant" <Prashant.Meka...@morganstanley.com>
Subject Recommendations for zookeeper deployment
Date Tue, 12 Jan 2010 18:38:04 GMT

http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great resource. It's
rare to see a open source project think so much about practical enterprise deployment and
this is much appreciated.

There are a few more recommendations that I think would be useful to add to the page. 

1. dataDir size: Since the dataDir stores snapshots and you recommend storing at least 3 snapshots,
I am thinking of using 3 times the size of the heap allocated to the process as a guideline
for how big the dataDir drive should be.
2. dataLogDir size: Since a new log file is started every time a snapshot is taken, and using
3 snapshots as a recommendation, I am thinking of using the same 3 times size of heap as a
3. Persistence of data and log directories: https://issues.apache.org/jira/browse/ZOOKEEPER-546
implies that there are cases where all zk data is  loaded from a different configuration store.
In such cases, even if I use a disk that is cleaned regularly(on reboots or rebuilds), I would
be fine. 

Also - If a zk server were to be added to an existing ensemble- for example when the machine
reboots), if the data and datalog directories are empty, it seems to me that the server would
sync with the leader and build its log and snapshots again, although there will be a performance
hit on the entire ensemble while this is taking place. Is this correct ?

Thanks again

NOTICE: If received in error, please destroy, and notify sender. Sender does not intend to
waive confidentiality or privilege. Use of this email is prohibited when received in error.
We may monitor and store emails to the extent permitted by applicable law.

View raw message