zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Timing issue when running embedded?
Date Thu, 19 May 2011 19:14:37 GMT
Hi Gunnar this is great detective work. It certainly sounds like it
might be some timing issue or possible bug in ZK exposed by this
embedded case. A few questions:

1) in this dev/embedded case you only have a single zk server, correct?

2) you have 2 clients in this case, one creating the znode and one
watching (vs say a single client doing both)

It would be great if you were able to construct a test case that
reproduces this, is that something you could provide?

I'd also suggest creating a JIRA to track this issue:
https://issues.apache.org/jira/browse/ZOOKEEPER

Patrick

On Tue, May 17, 2011 at 11:45 PM, Gunnar Wagenknecht
<gunnar@wagenknecht.org> wrote:
> Hi,
>
> I have an application that uses ZooKeeper. There is an ensemble in
> production. But in order to simplify development the application will
> start an embedded ZooKeeper server when started in development mode. We
> are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering
> if this is allowed to be happen or if we did something wrong when
> starting the embedded server.
>
>
> Basically, we have a watch registered using an #exists call and watch
> code like the following.
>
> @Override
> public void process(final WatchedEvent event) {
>  switch (event.getType()) {
>    ...
>    case NodeCreated:
>      pathCreated(event.getPath());
>      break;
>    ...
>  }
> }
>
> @Override
> protected void pathCreated(final String path) {
>  // process events only for this node
>  if (!isMyPath(path))
>    return;
>  try {
>    loadNode(); // calls zk.getData(String, Watcher, Stat)
>  } catch (final Exception e) {
>    // got NoNodeException here (but not when debugging)
>    log(..., e)
>  }
> }
>
>
>
> From inspecting the logs we noticed a NoNodeException. When setting
> breakpoints on #loadNode and stepping through we don't get the
> exception. But when setting a breakpoint on #log only we got a hit and
> could confirm the issue this way.
>
> The path is actually some levels deep. All the parent paths don't exist
> either so they are created as well. However, no exception is thrown fro
> them. The sequence is as follows.
>
> /l1  --> watch triggered, getData, no exception
> /l1/l2  --> watch triggered, getData, no exception
> /l1/l2/l3  --> watch triggered, getData, no exception
> /l1/l2/l3/l4  --> watch triggered, getData, no exception
> /l1/l2/l3/l4/l5  --> watch triggered, getData, no exception
> /l1/l2/l3/l4/l5/l6  --> watch triggered, getData, NoNodeException
>
> The only difference is that all paths up to including l5 do not actually
> have any data. Only l6 has some data. Could there be some latency issues?
>
> For completeness, the embedded server is started as follows.
>
> // disable LOG4J JMX stuff
> System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString());
>
> // get directories
> final File dataDir = new File(config.getDataLogDir());
> final File snapDir = new File(config.getDataDir());
>
> // clean old logs
> PurgeTxnLog.purge(dataDir, snapDir, 3);
>
> // create standalone server
> zkServer = new ZooKeeperServer();
> zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir));
> zkServer.setTickTime(config.getTickTime());
> zkServer.setMinSessionTimeout(config.getMinSessionTimeout());
> zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout());
>
> factory = new NIOServerCnxn.Factory(config.getClientPortAddress(),
> config.getMaxClientCnxns());
>
> // start server
> LOG.info("Starting ZooKeeper standalone server.");
> try {
>  factory.startup(zkServer);
> } catch (final InterruptedException e) {
>  LOG.warn("Interrupted during server start.", e);
>  Thread.currentThread().interrupt();
> }
>
>
> -Gunnar
>
>
> --
> Gunnar Wagenknecht
> gunnar@wagenknecht.org
> http://wagenknecht.org/
>
>

Mime
View raw message