accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: No Recovery Node In Zookeeper
Date Wed, 13 Jun 2012 22:37:19 GMT
 From a brief look, your change looks good. I would be concerned about 
not having an 'else' block with a big-fat-warning though. The only 
intent seems to be to notify in the logs that recoveries were found to 
be processed. In the same fashion, I would expect a log message if the 
entire recovery directory was not present.

Not being able to find that directory could signify larger problems 
(like those you ran into), and could, possibly, result in a user 
silently losing data due to missing recoveries.

I'm not entirely sure of the implications downstream if the recovery dir 
doesn't exist (what does the master do if it gets past the check with 
you patch and no recovery/ directory exists?). I don't personally don't 
know without diving farther into the Master. I would want to get 
verification from someone else before making a change that could have 
such a large impact.

(poke, poke Eric/Keith/Adam/Billie/John)

- Josh

On 6/12/2012 10:46 PM, David Medinets wrote:
> This code does not avoid the recovery entries, it just checks that the
> entries exist before looping over them.
>
> On Tue, Jun 12, 2012 at 10:42 PM, William Slacum<wslacum@gmail.com>  wrote:
>> Does this just address your symptom? I'd be concerned that there was a
>> recovery issue that put the Accumulo instance in this state and with
>> the change in effect nobody would know about it.
>>
>> On Tue, Jun 12, 2012 at 10:25 PM, David Medinets
>> <david.medinets@gmail.com>  wrote:
>>> I am greping source left and right but am not sure what to make of
>>> this error. Here is the code from Master.java:
>>>
>>>     ZooReaderWriter.getInstance().getChildren(zroot +
>>> Constants.ZRECOVERY, new Watcher() {
>>>       @Override
>>>       public void process(WatchedEvent event) {
>>>         nextEvent.event("Noticed recovery changes", event.getType());
>>>       }
>>>     });
>>>
>>> I suggest replacing the above code with this:
>>>
>>>     final String recoveryPath = zroot + Constants.ZRECOVERY;
>>>     Stat stat =
>>> ZooReaderWriter.getInstance().getZooKeeper().exists(recoveryPath,
>>> null);
>>>     if (stat != null&&  stat.getNumChildren()>  0) {
>>>       ZooReaderWriter.getInstance().getChildren(recoveryPath, new Watcher() {
>>>         @Override
>>>         public void process(WatchedEvent event) {
>>>           nextEvent.event("Noticed recovery changes", event.getType());
>>>         }
>>>       });
>>>     }
>>>
>>> I have changed my local Accumulo and this change seems to be Ok.
>>> However, since this is a change to Accumulo itself, I would like
>>> someone to code review before I commit this change. Does this change
>>> make sense?
>>>
>>> On Mon, Jun 11, 2012 at 9:54 PM, David Medinets
>>> <david.medinets@gmail.com>  wrote:
>>>> I am slowly working my way through whatever went wrong on my system.
>>>> This is the latest. I've deleted the logs and started the master by
>>>> hand:
>>>>
>>>> accumulo org.apache.accumulo.server.master.state.SetGoalState NORMAL
>>>> start-server.sh localhost master
>>>>
>>>> Then checked the log files where I saw this message:
>>>>
>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>>>> = NoNode for /accumulo/b519799c-3a51-4c9b-af21-96d577e2c11f/recovery
>>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448)
>>>>         at org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:62)
>>>>         at org.apache.accumulo.server.master.Master.run(Master.java:2071)
>>>>         at org.apache.accumulo.server.master.Master.main(Master.java:2173)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:601)
>>>>
>>>> I've run out of time for debugging today. I'll dig into the source
>>>> code more tomorrow ... until someone can point me in the right
>>>> direction to resolve this?

Mime
View raw message