Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 9335 invoked from network); 29 Jun 2009 15:35:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Jun 2009 15:35:01 -0000 Received: (qmail 11228 invoked by uid 500); 29 Jun 2009 15:35:12 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 10991 invoked by uid 500); 29 Jun 2009 15:35:11 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 10784 invoked by uid 99); 29 Jun 2009 15:35:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jun 2009 15:35:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jun 2009 15:35:08 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 42A6A234C046 for ; Mon, 29 Jun 2009 08:34:47 -0700 (PDT) Message-ID: <368505923.1246289687272.JavaMail.jira@brutus> Date: Mon, 29 Jun 2009 08:34:47 -0700 (PDT) From: "Billy Pearson (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1583) Start/Stop of large cluster untenable In-Reply-To: <2094681343.1245892027366.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725218#action_12725218 ] Billy Pearson commented on HBASE-1583: -------------------------------------- I suggested that we do not do come out of safe mode until all regions have been assigned when we added safe mode and make the regions not run compactions while in safe mode I thank that would be an easy fix for this problem I have seen the same thing when you have region that are behind on compactions after a shutdown on start up compaction tie up reassignments. Billy > Start/Stop of large cluster untenable > ------------------------------------- > > Key: HBASE-1583 > URL: https://issues.apache.org/jira/browse/HBASE-1583 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > Fix For: 0.20.0 > > > Starting and stopping a loaded large cluster is way too flakey and takes too long. This is 0.19.x but same issues apply to TRUNK I'd say. > At pset with our > 100 nodes carrying 6k regions: > + shutdown takes way too long.... maybe ten minutes or so. We compact regions inline with shutdown. We should just go down. It doesn't seem like all regionservers go down everytime either. > + startup is a mess with our assigning out regions an rebalancing at same time. By time that the compactions on open run, it can be near an hour before whole thing settles down and becomes useable -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.