Hi Gunnar this is great detective work. It certainly sounds like it
might be some timing issue or possible bug in ZK exposed by this
embedded case. A few questions:
1) in this dev/embedded case you only have a single zk server, correct?
2) you have 2 clients in this case, one creating the znode and one
watching (vs say a single client doing both)
It would be great if you were able to construct a test case that
reproduces this, is that something you could provide?
I'd also suggest creating a JIRA to track this issue:
https://issues.apache.org/jira/browse/ZOOKEEPER
Patrick
On Tue, May 17, 2011 at 11:45 PM, Gunnar Wagenknecht
<gunnar@wagenknecht.org> wrote:
> Hi,
>
> I have an application that uses ZooKeeper. There is an ensemble in
> production. But in order to simplify development the application will
> start an embedded ZooKeeper server when started in development mode. We
> are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering
> if this is allowed to be happen or if we did something wrong when
> starting the embedded server.
>
>
> Basically, we have a watch registered using an #exists call and watch
> code like the following.
>
> @Override
> public void process(final WatchedEvent event) {
> switch (event.getType()) {
> ...
> case NodeCreated:
> pathCreated(event.getPath());
> break;
> ...
> }
> }
>
> @Override
> protected void pathCreated(final String path) {
> // process events only for this node
> if (!isMyPath(path))
> return;
> try {
> loadNode(); // calls zk.getData(String, Watcher, Stat)
> } catch (final Exception e) {
> // got NoNodeException here (but not when debugging)
> log(..., e)
> }
> }
>
>
>
> From inspecting the logs we noticed a NoNodeException. When setting
> breakpoints on #loadNode and stepping through we don't get the
> exception. But when setting a breakpoint on #log only we got a hit and
> could confirm the issue this way.
>
> The path is actually some levels deep. All the parent paths don't exist
> either so they are created as well. However, no exception is thrown fro
> them. The sequence is as follows.
>
> /l1 --> watch triggered, getData, no exception
> /l1/l2 --> watch triggered, getData, no exception
> /l1/l2/l3 --> watch triggered, getData, no exception
> /l1/l2/l3/l4 --> watch triggered, getData, no exception
> /l1/l2/l3/l4/l5 --> watch triggered, getData, no exception
> /l1/l2/l3/l4/l5/l6 --> watch triggered, getData, NoNodeException
>
> The only difference is that all paths up to including l5 do not actually
> have any data. Only l6 has some data. Could there be some latency issues?
>
> For completeness, the embedded server is started as follows.
>
> // disable LOG4J JMX stuff
> System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString());
>
> // get directories
> final File dataDir = new File(config.getDataLogDir());
> final File snapDir = new File(config.getDataDir());
>
> // clean old logs
> PurgeTxnLog.purge(dataDir, snapDir, 3);
>
> // create standalone server
> zkServer = new ZooKeeperServer();
> zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir));
> zkServer.setTickTime(config.getTickTime());
> zkServer.setMinSessionTimeout(config.getMinSessionTimeout());
> zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout());
>
> factory = new NIOServerCnxn.Factory(config.getClientPortAddress(),
> config.getMaxClientCnxns());
>
> // start server
> LOG.info("Starting ZooKeeper standalone server.");
> try {
> factory.startup(zkServer);
> } catch (final InterruptedException e) {
> LOG.warn("Interrupted during server start.", e);
> Thread.currentThread().interrupt();
> }
>
>
> -Gunnar
>
>
> --
> Gunnar Wagenknecht
> gunnar@wagenknecht.org
> http://wagenknecht.org/
>
>
|