accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: No Recovery Node In Zookeeper
Date Thu, 14 Jun 2012 15:55:00 GMT
There is a bigger problem here.  This code is trying to place a
zookeeper watcher on the recovery node in zookeeper.  Its doing this
so that changes to the nodes children will trigger the master to take
action related to recovery.  If the watcher is not put into place, the
recoveries may not proceed as fast as they could.  I looked for
references to Constants.ZRECOVERY in the code and do not see one place
where recovery is always created.  It seems to be created on an as
needed basis.   One solution may be to modify the upgrade and init
code to create this node in zookeeper.  This way its always there and
can be watched.

I would advise waiting for Eric to chime in on this, since he just
made a huge amount of changes to the log recovery code.

A general zookeeper coding tip.  Calling exists() and then calling
getData() or getChildren(), can lead to a race condition.  It possible
that the node could exists when you call exists(), but then be deleted
by another process before you call getData() or getChildren().  The
best way to deal with this is the following pattern.

try{
   getChildren() //or getData() etc.
}catch(NoNodeException nne){
   //the node does not exists, handle that case... no race condition
}


On Tue, Jun 12, 2012 at 10:25 PM, David Medinets
<david.medinets@gmail.com> wrote:
> I am greping source left and right but am not sure what to make of
> this error. Here is the code from Master.java:
>
>    ZooReaderWriter.getInstance().getChildren(zroot +
> Constants.ZRECOVERY, new Watcher() {
>      @Override
>      public void process(WatchedEvent event) {
>        nextEvent.event("Noticed recovery changes", event.getType());
>      }
>    });
>
> I suggest replacing the above code with this:
>
>    final String recoveryPath = zroot + Constants.ZRECOVERY;
>    Stat stat =
> ZooReaderWriter.getInstance().getZooKeeper().exists(recoveryPath,
> null);
>    if (stat != null && stat.getNumChildren() > 0) {
>      ZooReaderWriter.getInstance().getChildren(recoveryPath, new Watcher() {
>        @Override
>        public void process(WatchedEvent event) {
>          nextEvent.event("Noticed recovery changes", event.getType());
>        }
>      });
>    }
>
> I have changed my local Accumulo and this change seems to be Ok.
> However, since this is a change to Accumulo itself, I would like
> someone to code review before I commit this change. Does this change
> make sense?
>
> On Mon, Jun 11, 2012 at 9:54 PM, David Medinets
> <david.medinets@gmail.com> wrote:
>> I am slowly working my way through whatever went wrong on my system.
>> This is the latest. I've deleted the logs and started the master by
>> hand:
>>
>> accumulo org.apache.accumulo.server.master.state.SetGoalState NORMAL
>> start-server.sh localhost master
>>
>> Then checked the log files where I saw this message:
>>
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>> = NoNode for /accumulo/b519799c-3a51-4c9b-af21-96d577e2c11f/recovery
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448)
>>        at org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:62)
>>        at org.apache.accumulo.server.master.Master.run(Master.java:2071)
>>        at org.apache.accumulo.server.master.Master.main(Master.java:2173)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:601)
>>
>> I've run out of time for debugging today. I'll dig into the source
>> code more tomorrow ... until someone can point me in the right
>> direction to resolve this?

Mime
View raw message