Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51721F575 for ; Fri, 14 Nov 2014 17:13:45 +0000 (UTC) Received: (qmail 45244 invoked by uid 500); 14 Nov 2014 17:13:43 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 45174 invoked by uid 500); 14 Nov 2014 17:13:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 45162 invoked by uid 99); 14 Nov 2014 17:13:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 17:13:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.214.171 as permitted sender) Received: from [209.85.214.171] (HELO mail-ob0-f171.google.com) (209.85.214.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 17:13:37 +0000 Received: by mail-ob0-f171.google.com with SMTP id wp18so13000909obc.16 for ; Fri, 14 Nov 2014 09:12:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FamHIM/QbiVFrwpxxtZ5UPE0nrJTsTBbunk1BsNLVXQ=; b=QBNKsvO52w1UzBTTmDwCyO9ZrN2kaO1YW2GXUqFh7SyijDGehpTl7Ilb+DKOZ4rAJD e/yMOrNEPM2squCtV7FJdNLZGYH53tgg483CbYtH/TYmTzw/Ql0G72D9iJEbUs4jOAdP XRPSRkm0WVbMC/KBy5389bX2aL8mgs/TWYNpiIJQYA5HVDQhS7VBYcnUJxlf5Lrfzkfg DIUzJswJxMtVR30VPMPaYxDzW5epr5zu/TPYX/f82DbY1IFgZxydxYi9yagbkpy6AfKj 7YfK6wv0AUQ8uOUgvTZXbUBIXK9VVLdewwbqRdkgnZenHUdO/nfGLZBKaRnIb3PFO+ik GC0g== MIME-Version: 1.0 X-Received: by 10.202.128.139 with SMTP id b133mr8317995oid.52.1415985151840; Fri, 14 Nov 2014 09:12:31 -0800 (PST) Received: by 10.202.65.130 with HTTP; Fri, 14 Nov 2014 09:12:31 -0800 (PST) In-Reply-To: References: Date: Fri, 14 Nov 2014 09:12:31 -0800 Message-ID: Subject: Re: Forcibly merging regions From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113ef442fb9b600507d4bb5d X-Virus-Checked: Checked by ClamAV on apache.org --001a113ef442fb9b600507d4bb5d Content-Type: text/plain; charset=UTF-8 One possibility was that region 7373f75181c71eb5061a6673cee15931 was involved in some hbase snapshot. Was the underlying table being snapshotted in recent past ? Cheers On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus wrote: > Thanks again. > > But I have been polling for a while and it still doesn't merge. I mean this > particular region example that I sent you, I am trying to merge it since > yesterday. I ran the polling-base code all night and I have to kill it. > Then in the morning, I tried manual merging through hbase shell and it > still doesn't merge. Note that the current polling logic doesnot try to > call merge again. It just checks the region size. > > So how to clean it then? Or actually make it merge? Plus is this something > expected (a region keeping a reference)? How can we avoid it? > > Note that this is not limited to this table only. We are seeing this in > other regions of other tables as well. Are we merging too fast? > > > > Regards, > Shahab > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu wrote: > > > Polling as you described is fine. > > > > catalogJanitor.cleanMergeQualifier() is called by > > DispatchMergingRegionHandler. > > > > If clean was successful, you would see the following: > > > > LOG.debug("Deleting region " + regionA.getRegionNameAsString() + " > > and " > > > > + regionB.getRegionNameAsString() > > > > + " from fs because merged region no longer holds references"); > > > > Assuming there was no log below in your master log: > > > > LOG.error("Merged region " + region.getRegionNameAsString() > > > > + " has only one merge qualifier in META."); > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still had > > reference file. > > > > Cheers > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus > > wrote: > > > > > Hi Ted. > > > > > > The log bit is below at the end of the email. This is the command to > > merge > > > that I gave just now through hbase shell. forcible was false but it > > behaves > > > similarly if forcible is true too. This is from master log. Indeed the > > > region merging was skipped! What does this mean? Data seems to be > intact > > > for this table. > > > > > > Just to give you a background. This table was first merge by the auto > > mated > > > java application. What we are doing is that we are merging tables > > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll > > for > > > the number of regions getting lowered after this merge call. The > > > application hangs and continues polling for ever as the previous merge > > > didn't happen. > > > > > > In this poll loop, we do get the number of regions by a fresh call to > > > HBaseAdmin.getTableRegions(tableName).getSize(). > > > > > > What are these merge qualifiers and what are we doing wrong or should > do? > > > > > > In the polling loop we can somehow retry merge again? But how can we > > know, > > > that we need to call merge again as it works for some regions. Is the > > table > > > meta corrupted for some reason by the above logic? > > > > > > Thanks a lot. > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session: > > > 0x348c7017707236b closed > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: > EventThread > > > shut down > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating > > > client connection, > > > > > > > > > connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2, > > > > > > > > > quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181, > > > baseZNode=/hbase > > > 2014-11-14 11:25:02,645 INFO > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to > > ZooKeeper > > > > > > > > > ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening > > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will > > not > > > attempt to authenticate using SASL (unknown error) > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket > > > connection established to ip-1010018.ec2.internal/1010019:2181, > > initiating > > > session > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session > > > establishment complete on server ip-1010018.ec2.internal/1010019:2181, > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000 > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session: > > > 0x348c7017707236c closed > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: > EventThread > > > shut down > > > 2014-11-14 11:25:30,713 INFO > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: > Skip > > > merging regions > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931., > > > > > > > > > TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096., > > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating > > > client connection, > > > > > > > > > connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2, > > > > > > > > > quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181, > > > baseZNode=/hbase > > > 2014-11-14 11:25:41,384 INFO > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to > > ZooKeeper > > > > > > > > > ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening > > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will > > not > > > attempt to authenticate using SASL (unknown error) > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket > > > connection established to ip-1010018.ec2.internal/1010019:2181, > > initiating > > > session > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session > > > establishment complete on server ip-1010018.ec2.internal/1010019:2181, > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000 > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session: > > > 0x348c7017707236e closed > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: > EventThread > > > shut down > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------ > > > > > > Regards, > > > Shahab > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu wrote: > > > > > > > Looking at DispatchMergingRegionHandler, it does some check before > > > > initiating the merge. > > > > e.g.: > > > > > > > > LOG.info("Skip merging regions " + > > region_a.getRegionNameAsString() > > > > > > > > + ", " + region_b.getRegionNameAsString() + ", because > > region " > > > > > > > > + (regionAHasMergeQualifier ? region_a.getEncodedName() : > > > > region_b > > > > > > > > .getEncodedName()) + " has merge qualifier"); > > > > > > > > Can you take a look at master log around the time merge request was > > > issued > > > > to see if you can get some clue ? > > > > > > > > Cheers > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus < > shahab.yunus@gmail.com> > > > > wrote: > > > > > > > > > The documentation of online merge tool (merge_region) states that > if > > we > > > > > forcibly merge regions (by setting the 3rd attribute as true) then > it > > > can > > > > > create overlapping regions. if this happens then will this render > the > > > > > region or table unusable or it is just a performance hit? I mean > how > > > > bigger > > > > > of a deal it is? > > > > > > > > > > Actually, we are merging regions using the programmatic API for > this > > > and > > > > > setting this flag ('forcible') as false. But for some tables (we > > > haven't > > > > > figured out a pattern yet, data is still accessible), merge of > > regions > > > do > > > > > not happen at all. Afterwards we tried with this flag = true, and > it > > > > still > > > > > doesn't merge them. > > > > > > > > > > CDH 5.1.0 > > > > > (Hbase is 0.98.1-cdh5.1.0) > > > > > > > > > > Regards, > > > > > Shahab > > > > > > > > > > > > > > > --001a113ef442fb9b600507d4bb5d--