Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECFEBFBDF for ; Fri, 14 Nov 2014 19:09:17 +0000 (UTC) Received: (qmail 99763 invoked by uid 500); 14 Nov 2014 19:09:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 99707 invoked by uid 500); 14 Nov 2014 19:09:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 99674 invoked by uid 99); 14 Nov 2014 19:09:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 19:09:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shahab.yunus@gmail.com designates 209.85.215.43 as permitted sender) Received: from [209.85.215.43] (HELO mail-la0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 19:09:09 +0000 Received: by mail-la0-f43.google.com with SMTP id ge10so15816443lab.30 for ; Fri, 14 Nov 2014 11:08:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rKggBj4djsL/EGvZK/wucc1y8ScAtWRexuHb0kRQ/Hk=; b=WMUdwG5ECQVzld3ncxb59mNkwitJ+i/S4yG+hVrwZBFfi8Eh9wn9kpGkhcEpoPdUMH aoCmm4fNflU7S2rMBqxOYE7TllAUu/9wo5qk/YVGmOUWeGB0Y84clGJMvutWVHqRur20 r/ObiO8XPB72iV2muWmoLlzRtf+POf9Uhqn/vAHi77uRybR+5pY66xtILyLq7F1n1OXe k8I13wLCVKiM+ZAcMWJ1qKeLglAhiXV9oruPs7a3BE32Dv9w93GdLZaPGXVbbnnTJBAJ n2Mjp5iXNPQqDwPJfoG6SLqSc1d9FgJIye+l/DTAV02MBl/DRGuPB7jm/68a1z+lYSb5 ATGw== MIME-Version: 1.0 X-Received: by 10.152.23.106 with SMTP id l10mr3449468laf.90.1415992082755; Fri, 14 Nov 2014 11:08:02 -0800 (PST) Received: by 10.25.210.1 with HTTP; Fri, 14 Nov 2014 11:08:02 -0800 (PST) In-Reply-To: References: Date: Fri, 14 Nov 2014 14:08:02 -0500 Message-ID: Subject: Re: Forcibly merging regions From: Shahab Yunus To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0160bf9a18f7160507d65934 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160bf9a18f7160507d65934 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Digging deeper into the code, I came across this (this is from CatalogJanitor#cleanMergeRegion): ... ... HFileArchiver.archiveRegion (this.services .getConfiguration (), fs, regionA); HFileArchiver.archiveRegion (this.services .getConfiguration (), fs, regionB); MetaEditor.deleteMergeQualifiers (server .getCatalogTracker (), mergedRegion); return true; Do you think it is ok if we face this issue then we forcibly archive and clean the regions ? Regards, Shahab On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus wrote: > Yesterday, I believe. > > Regards, > Shahab > > On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu wrote: > >> Shahab: >> When was the last time compaction was run on this table ? >> >> Cheers >> >> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus >> wrote: >> >> > I see. Thanks. >> > >> > And if the region indeed has references, then can we somehow forcibly >> > remove them? Is this even possible (if not advisable)? Basically what = I >> am >> > trying to ask is that let us say we do hit this scenario and we know i= t >> is >> > OK to go ahead and merge. What steps can we follow after detection of >> such >> > unwanted references. >> > >> > Regards, >> > Shahab >> > >> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu wrote: >> > >> > > For automated detection of such scenario, you can reference the code >> in >> > > CatalogJanitor#cleanMergeRegion(): >> > > >> > > regionFs =3D HRegionFileSystem.openRegionFromFileSystem( >> > > >> > > this.services.getConfiguration(), fs, tabledir, >> mergedRegion, >> > > true >> > > ); >> > > >> > > ... >> > > >> > > Then regionFs.hasReferences(htd) would tell you whether the underlyi= ng >> > > region has reference files. >> > > Cheers >> > > >> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus > > >> > > wrote: >> > > >> > > > No. Not that I can recall but I can check. >> > > > >> > > > From resolution perspective, is there any way we can resolve this. >> More >> > > > importantly, anyway we can automate the resolution, if we run into >> such >> > > > issues in future? 'Cleaning the qualifier', that is. >> > > > >> > > > Regards, >> > > > Shahab >> > > > >> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu >> wrote: >> > > > >> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931 >> was >> > > > > involved in some hbase snapshot. >> > > > > >> > > > > Was the underlying table being snapshotted in recent past ? >> > > > > >> > > > > Cheers >> > > > > >> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus < >> > shahab.yunus@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Thanks again. >> > > > > > >> > > > > > But I have been polling for a while and it still doesn't merge= . >> I >> > > mean >> > > > > this >> > > > > > particular region example that I sent you, I am trying to merg= e >> it >> > > > since >> > > > > > yesterday. I ran the polling-base code all night and I have to >> kill >> > > it. >> > > > > > Then in the morning, I tried manual merging through hbase shel= l >> and >> > > it >> > > > > > still doesn't merge. Note that the current polling logic doesn= ot >> > try >> > > to >> > > > > > call merge again. It just checks the region size. >> > > > > > >> > > > > > So how to clean it then? Or actually make it merge? Plus is th= is >> > > > > something >> > > > > > expected (a region keeping a reference)? How can we avoid it? >> > > > > > >> > > > > > Note that this is not limited to this table only. We are seein= g >> > this >> > > in >> > > > > > other regions of other tables as well. Are we merging too fast= ? >> > > > > > >> > > > > > >> > > > > > >> > > > > > Regards, >> > > > > > Shahab >> > > > > > >> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu >> > > wrote: >> > > > > > >> > > > > > > Polling as you described is fine. >> > > > > > > >> > > > > > > catalogJanitor.cleanMergeQualifier() is called by >> > > > > > > DispatchMergingRegionHandler. >> > > > > > > >> > > > > > > If clean was successful, you would see the following: >> > > > > > > >> > > > > > > LOG.debug("Deleting region " + >> > > regionA.getRegionNameAsString() >> > > > + >> > > > > " >> > > > > > > and " >> > > > > > > >> > > > > > > + regionB.getRegionNameAsString() >> > > > > > > >> > > > > > > + " from fs because merged region no longer holds >> > > > > references"); >> > > > > > > >> > > > > > > Assuming there was no log below in your master log: >> > > > > > > >> > > > > > > LOG.error("Merged region " + >> region.getRegionNameAsString() >> > > > > > > >> > > > > > > + " has only one merge qualifier in META."); >> > > > > > > >> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 >> still >> > > had >> > > > > > > reference file. >> > > > > > > >> > > > > > > Cheers >> > > > > > > >> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus < >> > > > shahab.yunus@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hi Ted. >> > > > > > > > >> > > > > > > > The log bit is below at the end of the email. This is the >> > command >> > > > to >> > > > > > > merge >> > > > > > > > that I gave just now through hbase shell. forcible was fal= se >> > but >> > > it >> > > > > > > behaves >> > > > > > > > similarly if forcible is true too. This is from master log= . >> > > Indeed >> > > > > the >> > > > > > > > region merging was skipped! What does this mean? Data seem= s >> to >> > be >> > > > > > intact >> > > > > > > > for this table. >> > > > > > > > >> > > > > > > > Just to give you a background. This table was first merge = by >> > the >> > > > auto >> > > > > > > mated >> > > > > > > > java application. What we are doing is that we are merging >> > tables >> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i >> async, >> > > we >> > > > > poll >> > > > > > > for >> > > > > > > > the number of regions getting lowered after this merge cal= l. >> > The >> > > > > > > > application hangs and continues polling for ever as the >> > previous >> > > > > merge >> > > > > > > > didn't happen. >> > > > > > > > >> > > > > > > > In this poll loop, we do get the number of regions by a >> fresh >> > > call >> > > > to >> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize(). >> > > > > > > > >> > > > > > > > What are these merge qualifiers and what are we doing wron= g >> or >> > > > should >> > > > > > do? >> > > > > > > > >> > > > > > > > In the polling loop we can somehow retry merge again? But >> how >> > can >> > > > we >> > > > > > > know, >> > > > > > > > that we need to call merge again as it works for some >> regions. >> > Is >> > > > the >> > > > > > > table >> > > > > > > > meta corrupted for some reason by the above logic? >> > > > > > > > >> > > > > > > > Thanks a lot. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > >> > > > >> > ----------------------------------------------------------------------= -- >> > > > > > > > >> > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeepe= r: >> > > > Session: >> > > > > > > > 0x348c7017707236b closed >> > > > > > > > 2014-11-14 11:25:02,643 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > > > EventThread >> > > > > > > > shut down >> > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeepe= r: >> > > > > Initiating >> > > > > > > > client connection, >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> connectString=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:218= 1,ip-1010018.ec2.internal:2181 >> > > > > > > > sessionTimeout=3D60000 >> > > > > watcher=3Dcatalogtracker-on-hconnection-0x47d865f2, >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> quorum=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-10= 10018.ec2.internal:2181, >> > > > > > > > baseZNode=3D/hbase >> > > > > > > > 2014-11-14 11:25:02,645 INFO >> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> Process >> > > > > > > > identifier=3Dcatalogtracker-on-hconnection-0x47d865f2 >> connecting >> > to >> > > > > > > ZooKeeper >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ensemble=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-= 1010018.ec2.internal:2181 >> > > > > > > > 2014-11-14 11:25:02,645 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Opening >> > > > > > > > socket connection to server >> > ip-1010018.ec2.internal/1010019:2181. >> > > > > Will >> > > > > > > not >> > > > > > > > attempt to authenticate using SASL (unknown error) >> > > > > > > > 2014-11-14 11:25:02,646 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Socket >> > > > > > > > connection established to >> ip-1010018.ec2.internal/1010019:2181, >> > > > > > > initiating >> > > > > > > > session >> > > > > > > > 2014-11-14 11:25:02,648 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Session >> > > > > > > > establishment complete on server >> > > > > ip-1010018.ec2.internal/1010019:2181, >> > > > > > > > sessionid =3D 0x348c7017707236c, negotiated timeout =3D 60= 000 >> > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeepe= r: >> > > > Session: >> > > > > > > > 0x348c7017707236c closed >> > > > > > > > 2014-11-14 11:25:02,703 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > > > EventThread >> > > > > > > > shut down >> > > > > > > > 2014-11-14 11:25:30,713 INFO >> > > > > > > > >> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandle= r: >> > > > > > Skip >> > > > > > > > merging regions >> > > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931= ., >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99= \x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096., >> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge >> > > qualifier >> > > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeepe= r: >> > > > > Initiating >> > > > > > > > client connection, >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> connectString=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:218= 1,ip-1010018.ec2.internal:2181 >> > > > > > > > sessionTimeout=3D60000 >> > > > > watcher=3Dcatalogtracker-on-hconnection-0x47d865f2, >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> quorum=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-10= 10018.ec2.internal:2181, >> > > > > > > > baseZNode=3D/hbase >> > > > > > > > 2014-11-14 11:25:41,384 INFO >> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: >> Process >> > > > > > > > identifier=3Dcatalogtracker-on-hconnection-0x47d865f2 >> connecting >> > to >> > > > > > > ZooKeeper >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ensemble=3Dip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-= 1010018.ec2.internal:2181 >> > > > > > > > 2014-11-14 11:25:41,384 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Opening >> > > > > > > > socket connection to server >> > ip-1010018.ec2.internal/1010019:2181. >> > > > > Will >> > > > > > > not >> > > > > > > > attempt to authenticate using SASL (unknown error) >> > > > > > > > 2014-11-14 11:25:41,386 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Socket >> > > > > > > > connection established to >> ip-1010018.ec2.internal/1010019:2181, >> > > > > > > initiating >> > > > > > > > session >> > > > > > > > 2014-11-14 11:25:41,389 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > Session >> > > > > > > > establishment complete on server >> > > > > ip-1010018.ec2.internal/1010019:2181, >> > > > > > > > sessionid =3D 0x348c7017707236e, negotiated timeout =3D 60= 000 >> > > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeepe= r: >> > > > Session: >> > > > > > > > 0x348c7017707236e closed >> > > > > > > > 2014-11-14 11:25:41,398 INFO >> org.apache.zookeeper.ClientCnxn: >> > > > > > EventThread >> > > > > > > > shut down >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> ------------------------------------------------------------------------= ------------------------------------------------------------ >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Shahab >> > > > > > > > >> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu < >> yuzhihong@gmail.com> >> > > > > wrote: >> > > > > > > > >> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some >> check >> > > > before >> > > > > > > > > initiating the merge. >> > > > > > > > > e.g.: >> > > > > > > > > >> > > > > > > > > LOG.info("Skip merging regions " + >> > > > > > > region_a.getRegionNameAsString() >> > > > > > > > > >> > > > > > > > > + ", " + region_b.getRegionNameAsString() + ", >> > > because >> > > > > > > region " >> > > > > > > > > >> > > > > > > > > + (regionAHasMergeQualifier ? >> > > > region_a.getEncodedName() : >> > > > > > > > > region_b >> > > > > > > > > >> > > > > > > > > .getEncodedName()) + " has merge >> qualifier"); >> > > > > > > > > >> > > > > > > > > Can you take a look at master log around the time merge >> > request >> > > > was >> > > > > > > > issued >> > > > > > > > > to see if you can get some clue ? >> > > > > > > > > >> > > > > > > > > Cheers >> > > > > > > > > >> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus < >> > > > > > shahab.yunus@gmail.com> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > The documentation of online merge tool (merge_region) >> > states >> > > > that >> > > > > > if >> > > > > > > we >> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute a= s >> > true) >> > > > > then >> > > > > > it >> > > > > > > > can >> > > > > > > > > > create overlapping regions. if this happens then will >> this >> > > > render >> > > > > > the >> > > > > > > > > > region or table unusable or it is just a performance >> hit? I >> > > > mean >> > > > > > how >> > > > > > > > > bigger >> > > > > > > > > > of a deal it is? >> > > > > > > > > > >> > > > > > > > > > Actually, we are merging regions using the programmati= c >> API >> > > for >> > > > > > this >> > > > > > > > and >> > > > > > > > > > setting this flag ('forcible') as false. But for some >> > tables >> > > > (we >> > > > > > > > haven't >> > > > > > > > > > figured out a pattern yet, data is still accessible), >> merge >> > > of >> > > > > > > regions >> > > > > > > > do >> > > > > > > > > > not happen at all. Afterwards we tried with this flag = =3D >> > true, >> > > > and >> > > > > > it >> > > > > > > > > still >> > > > > > > > > > doesn't merge them. >> > > > > > > > > > >> > > > > > > > > > CDH 5.1.0 >> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0) >> > > > > > > > > > >> > > > > > > > > > Regards, >> > > > > > > > > > Shahab >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > --089e0160bf9a18f7160507d65934--