accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frans Lawaetz <>
Subject setting zookeeper forceSync=no
Date Mon, 24 Feb 2014 17:32:00 GMT

Acknowledging in advance that what I'm asking goes against best practices
as described here and by the ZooKeeper guides as well..  I was wondering
what the possible consequences are to setting forceSync=no in zoo.cfg in
stand-alone installations where a single machine hosts accumulo, zookeeper,
Hadoop, etc.

This sort of configuration is obviously not for production and is used only
when a client is interested in seeing a demo of an accumulo-based
application but only has a single machine available at the time and often
with just a single drive serving all mounted file systems.  As one might
expect in this sort of setup the zookeeper log starts to populate with:

zookeeper.log.9:2014-01-21 19:19:38,885 [myid:] - WARN
 [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in
SyncThread:0 took 5898ms which will adversely effect operation latency. See
the ZooKeeper troubleshooting guide

Eventually Accumulo will time out with a ConnectionLoss and the master
process will go down.

Is Accumulo's use of zookeeper primarily for cluster-wide synchronization
during run-time or is there persistent stateful data that must be kept in
sync with the contents of walogs and/or table files in HDFS?

If the former then I imagine (in a stand-alone setup) that zookeeper
corruption due to incomplete syncs during a power failure or the like could
be remedied by a restart of the stack which would recover a prior zookeeper
snapshot.  If it's the latter then I can see things getting a bit messy.

Thanks in advance.



View raw message