Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 54348 invoked from network); 1 Dec 2010 18:39:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Dec 2010 18:39:34 -0000 Received: (qmail 80800 invoked by uid 500); 1 Dec 2010 18:39:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 80772 invoked by uid 500); 1 Dec 2010 18:39:33 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80764 invoked by uid 99); 1 Dec 2010 18:39:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 18:39:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of apol@rocketfuelinc.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 18:39:25 +0000 Received: by fxm13 with SMTP id 13so4930509fxm.14 for ; Wed, 01 Dec 2010 10:39:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.100.15 with SMTP id w15mr1631142fan.121.1291228745114; Wed, 01 Dec 2010 10:39:05 -0800 (PST) Received: by 10.223.125.76 with HTTP; Wed, 1 Dec 2010 10:39:05 -0800 (PST) In-Reply-To: References: Date: Wed, 1 Dec 2010 10:39:05 -0800 Message-ID: Subject: Re: disable table, alter table, and merge regions From: Abhijit Pol To: user@hbase.apache.org, ssechrist@gmail.com Content-Type: multipart/alternative; boundary=20cf3054a2e9ad26ed04965d9efb X-Virus-Checked: Checked by ClamAV on apache.org --20cf3054a2e9ad26ed04965d9efb Content-Type: text/plain; charset=UTF-8 Thanks Sean for detailed reply. This is very useful. I think we can afford to shut down cluster for few hours in the night and probably do this in batches. Few questions to use your experience on this: 1. You are referring to Shut down of HBase to bring down RS, Master and Zookeeper? I was getting some errors test merging when all three were down. 2. Just to make sure this is the merge utility you used? $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.util.Merge Few follow-up questions On Wed, Dec 1, 2010 at 6:09 AM, Sean Sechrist wrote: > Hey Abhijit, > > We ran into this same issue a while back, and here is what we did (and it > seemed to work ok): > > 1. Went onto to the HBase web ui for our biggest table, and grabbed all of > the region names (they appear in order on that page). Saved the region > names > to a text file. > 2. Wrote a shell script to run the hbase merge tool on every pair of > regions > in that file. > 3. Shut down HBase. > 4. Run that shell script. It went at about 50 merges/hour on our 5 node > cluster. > 5. Start HBase. When it went back up we saw that our region count was about > 1500 regions, down from almost 3000. > > So this would only work if you can take down HBase for a decent amount of > time. > > I wonder if you could alternatively, run an Export job and an Import job of > your table. Do those preserve the regions, or could you use it to bring > down > the number of regions? > > -Sean > > On Wed, Dec 1, 2010 at 2:24 AM, Abhijit Pol wrote: > > > We have HBase cluster which was peacefully (acceptable throughput and > > latencies) serving for about a month (we are using 0.89.20100926 version) > > > > This morning we wanted to set TTL to value smaller than default and Mr. > > Murphy struck. > > > > (A) We disabled and altered table with desired TTL value (using shell). > > Alerting one property (in this case TTL) reseted all other > > table properties to default values. For example version was reset to 3, > > compression was reset to NONE etc. [I think this is known issue with open > > Jira] > > > > (B) We wanted to go back to previous table properties. Now after multiple > > retires we were not able to disable table (even restarting cluster didn't > > helped). Most likely if clients are hitting hard (in this case ~10k qps) > on > > HBase table, it takes forever to disable a table. > > > > So we stopped all clients and then were able to disable table and altered > > table properties to desired values. > > > > (C) Due to compression was reset to NONE and version was reset to 3 for > > good > > 10-12hrs, the total number of regions tripled and load (#regions/RS) > > increased from 100 to 300. After first major_compaction, compression, > > version 1, and new lower TTL became effective and we were back to > original > > HDFS footprint and have bunch of small regions. > > > > What will trigger merging of these regions? the tool for merging does not > > seem to work or even if it does, it can only do two regions at-a-time. > > > > Any suggestions on how we can reduce number of regions and bring load > back > > to where it was before? > > > > > > Thanks, > > --Abhi > > > --20cf3054a2e9ad26ed04965d9efb--