hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1583) Start/Stop of large cluster untenable
Date Thu, 16 Jul 2009 23:18:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732243#action_12732243

stack commented on HBASE-1583:

I took a look at this safe mode stuff.  Its broken.  Will open an issue.  Whats happening
is that we exit safe mode near immediately after startup because initial MetaScanner scan
does nothing except set that initial scan has completed (though it did nothing -- original
idea was that initialScan would do first scan of the newly deployed .META.).  So, we exit
safe mode near immediately after startup.

Fixing metascanner so initial scan doesn't happen till we've scanned actual deploy so safe
mode stays in place while deploy is going on kills our assignment rate.  It crawls.  I gave
up trying to debug more since these above patches undoing compactions on close and open seem
to be enough to close this issue at least for 0.20.0 release.

> Start/Stop of large cluster untenable
> -------------------------------------
>                 Key: HBASE-1583
>                 URL: https://issues.apache.org/jira/browse/HBASE-1583
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.0
>         Attachments: 1583-nocompactonclose.patch, 1583-v2-nocompactonopenclose.patch
> Starting and stopping a loaded large cluster is way too flakey and takes too long.  This
is 0.19.x but same issues apply to TRUNK I'd say.
> At pset with our > 100 nodes carrying 6k regions:
> + shutdown takes way too long.... maybe ten minutes or so.  We compact regions inline
with shutdown.  We should just go down.  It doesn't seem like all regionservers go down everytime
> + startup is a mess with our assigning out regions an rebalancing at same time.  By time
that the compactions on open run, it can be near an hour before whole thing settles down and
becomes useable

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message