hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and starting Zookeeper servers
Date Mon, 08 Dec 2008 22:45:44 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mahadev konar updated ZOOKEEPER-251:
------------------------------------

    Priority: Blocker  (was: Major)

i found the problem. The problem occurs when the snapshots are well ahead of the logs. That
would be the case when the server is brought up and down and there are not trasactions on
it. So there are no new logs to be applied to the snapshot. This is due to the bug that
{noformat}
while (hdr.getZxid() < zxid) {
               next()) 
{noformat}
does not check the value of next() 

and also 

next() itself does not keep ia and inputstream syncrhonous.

{noformat}
  } catch (EOFException e) {
                LOG.info("EOF exception ", e);
                inputStream.close();
                inputStream = null;
                // thsi means that the file has ended 
                // we shoud go to the next file
{noformat}

should be 

{noformat}
  } catch (EOFException e) {
                LOG.info("EOF exception ", e);
                inputStream.close();
                inputStream = null;
                ia = null;
                // thsi means that the file has ended 
                // we shoud go to the next file
{noformat}



> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-251
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.0.0, 3.0.1
>         Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in
this case.
>            Reporter: Thomas Vinod Johnson
>            Assignee: Mahadev konar
>            Priority: Blocker
>             Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all
running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create
/node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds
later. 
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server
exiting:
> java.lang.NullPointerException
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
>         at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
>         at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
>         at 
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183" 
> java.lang.NullPointerException
>         at 
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called 
> at line 358 even after next returns false. Having very little knowledge 
> about the implementation, I don't know if the existence of hdr.getZxid() 
>  >= zxid is supposed to be an invariant across all invocations of the 
> server; however the following change to FileTxnLog.java seems to make 
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> <                 next();
> ---
>  >               if (!next())
>  >                   return;
> 447c448,450
> <                 inputStream.close();
> ---
>  >               if (inputStream != null) {
>  >                   inputStream.close();
>  >               }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message