zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log
Date Mon, 08 Aug 2016 09:29:20 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411587#comment-15411587
] 

Flavio Junqueira commented on ZOOKEEPER-2247:
---------------------------------------------

The patch looks good, it is pretty much ready to go. Since you'll have to generate at least
the 3.4 patch, I'll ask you to fix a couple of small things, hope you don't mind:

I have extended the javadoc of `setState`, would you mind using this one instead:

{noformat}
/**
+     * Sets the state of ZooKeeper server. After changing the state, it
+     * notifies the server state change to a registered shutdown handler,
+     * if any.
+     * <p>
+     * The following are the server state transitions:
+     * <li>During startup the server will be in the INITIAL state.</li>
+     * <li>After successfully starting, the server sets the state to
+     * RUNNING.</li>
+     * <li> The server transitions to the ERROR state if it hits an internal error.
+     * {@link ZooKeeperServerListenerImpl} notifies any critical resource error
+     * events, e.g., SyncRequestProcessor not being able to write a txn to disk.</li>
+     * <li>During shutdown the server sets the state to SHUTDOWN, which
+     * corresponds to the server not running.</li>
+     *
+     * @param state new server state.
+     */
{noformat}

The same for the javadoc of {{ZooKeeperServerShutdownHandler}}:

{noformat}
+/**
+ * ZooKeeper server shutdown handler which will be used to handle ERROR or
+ * SHUTDOWN server state transitions, which in turn releases the associated shutdown
+ * latch.
+ */
{noformat}

Finally, in {{waitForNewLeaderElection}}, would you mind setting the {{Thread.sleep}} to sleep
for only 100ms each time? Not sure if we need to increase the counter, but I'd rather reduce
the duration of each iteration.

Otherwise, it looks very good, thanks for bearing with all the comments and working hard to
get it in good shape. If you make these changes and generate the branch patches, I'll check
this one in.


> Zookeeper service becomes unavailable when leader fails to write transaction log
> --------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2247
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Arshad Mohammad
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.9, 3.5.3, 3.6.0
>
>         Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, ZOOKEEPER-2247-03.patch,
ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch,
ZOOKEEPER-2247-09.patch, ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch,
ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, ZOOKEEPER-2247-16.patch,
ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, ZOOKEEPER-2247-19.patch, ZOOKEEPER-2247-20.patch,
ZOOKEEPER-2247-21.patch, ZOOKEEPER-2247-b3.5.patch, ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction log. Bellow
are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR [SyncThread:100:ZooKeeperCriticalThread@48]
- Severe unrecoverable error, from thread : SyncThread:100
> java.io.IOException: Input/output error
> 	at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
> 	at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
> 	at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
> 	at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
> 	at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
> 	at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500]
- Thread SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  [SyncThread:100:ZooKeeperServer@523] - shutting
down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  [SyncThread:100:SessionTrackerImpl@232] -
Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  [SyncThread:100:LeaderRequestProcessor@77]
- Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  [SyncThread:100:PrepRequestProcessor@1035]
- Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  [SyncThread:100:ProposalRequestProcessor@88]
- Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  [SyncThread:100:CommitProcessor@356] - Shutting
down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  [CommitProcessor:100:CommitProcessor@191]
- CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915]
- Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  [SyncThread:100:FinalRequestProcessor@646]
- shutdown of request processor complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  [SyncThread:100:SyncRequestProcessor@191]
- Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 cport:-1)::PrepRequestProcessor@159]
- PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non recoverable exception
the leader should go down and let other followers become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message