zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] (ZOOKEEPER-2678) Large databases take a long time to regain a quorum
Date Tue, 31 Jan 2017 21:10:54 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847522#comment-15847522
] 

Hadoop QA commented on ZOOKEEPER-2678:
--------------------------------------

+1 overall.  GitHub Pull Request  Build
      

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/273//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/273//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/273//console

This message is automatically generated.

> Large databases take a long time to regain a quorum
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2678
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2678
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.9, 3.5.2
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> I know this is long but please here me out.
> I recently inherited a massive zookeeper ensemble.  The snapshot is 3.4 GB on disk. 
Because of its massive size we have been running into a number of issues. There are lots of
problems that we hope to fix with tuning GC etc, but the big one right now that is blocking
us making a lot of progress on the rest of them is that when we lose a quorum because the
leader left, for what ever reason, it can take well over 5 mins for a new quorum to be established.
 So we cannot tune the leader without risking downtime.
> We traced down where the time was being spent and found that each server was clearing
the database so it would be read back in again before leader election even started.  Then
as part of the sync phase each server will write out a snapshot to checkpoint the progress
it made as part of the sync.
> I will be putting up a patch shortly with some proposed changes in it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message