hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rajeshbabu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7006) [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
Date Thu, 16 May 2013 18:45:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659817#comment-13659817
] 

rajeshbabu commented on HBASE-7006:
-----------------------------------

if any failures during .META. assignment other than master shutdown/stop we are infinitely
waiting until .META. is assigned.
{code}
    enableServerShutdownHandler();
    this.catalogTracker.waitForMeta();
    // Above check waits for general meta availability but this does not
    // guarantee that the transition has completed
    this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
{code}
Only chance we will come out from initialization before assigning META is master shutdown/stop.
then any way we will not block in loop as well.
{code}
  private void loop() {
    long lastMsgTs = 0l;
    long now = 0l;
    while (!this.stopped) {
      now = System.currentTimeMillis();
      if ((now - lastMsgTs) >= this.msgInterval) {
        doMetrics();
        lastMsgTs = System.currentTimeMillis();
      }
      stopSleeper.sleep();
    }
  }
{code}
                
> [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
> -------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: New Feature
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.98.0, 0.95.1
>
>         Attachments: hbase-7006-addendum.patch, hbase-7006-combined.patch, hbase-7006-combined-v1.patch,
hbase-7006-combined-v4.patch, hbase-7006-combined-v5.patch, hbase-7006-combined-v6.patch,
hbase-7006-combined-v7.patch, hbase-7006-combined-v8.patch, hbase-7006-combined-v9.patch,
LogSplitting Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message