Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com
 designates 209.85.214.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEo-6+RC0p0JzsoVCpTEZ0Nra2UxiLRH7SWx8Apg_ZWSgor9rw@mail.gmail.com>
References: 
 <CAEo-6+SM4qZK6Gm=JtcnaUCa4Tw=XXQcRRaMfpsAC9sS8-xU4A@mail.gmail.com>
	<CALte62zOPi2ZBbcVwuk15yKYujoWzxUowar-9oyozGq-B-eTRA@mail.gmail.com>
	<CAEo-6+SbY+iw3fxEUJxf0TXksmwMX1iTq8zhA7fYXfSF=rcEUg@mail.gmail.com>
	<CALte62yLaiz+gaWKPyTC66-vgR5oCgtQqjw9faaoM-EPcAuppg@mail.gmail.com>
	<CAEo-6+RC0p0JzsoVCpTEZ0Nra2UxiLRH7SWx8Apg_ZWSgor9rw@mail.gmail.com>
Date: Fri, 14 Nov 2014 09:12:31 -0800
Message-ID: 
 <CALte62wSjtpfS9hDOQx4W=Yb9jv+=+OaRMYWEY4EgU+gWQ9XAw@mail.gmail.com>
Subject: Re: Forcibly merging regions
From: Ted Yu <yuzhihong@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a113ef442fb9b600507d4bb5d

--001a113ef442fb9b600507d4bb5d
Content-Type: text/plain; charset=UTF-8

One possibility was that region 7373f75181c71eb5061a6673cee15931 was
involved in some hbase snapshot.

Was the underlying table being snapshotted in recent past ?

Cheers

On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <shahab.yunus@gmail.com>
wrote:

> Thanks again.
>
> But I have been polling for a while and it still doesn't merge. I mean this
> particular region example that I sent you, I am trying to merge it since
> yesterday. I ran the polling-base code all night and I have to kill it.
> Then in the morning, I tried manual merging through hbase shell and it
> still doesn't merge. Note that the current polling logic doesnot try to
> call merge again. It just checks the region size.
>
> So how to clean it then? Or actually make it merge? Plus is this something
> expected (a region keeping a reference)? How can we avoid it?
>
> Note that this is not limited to this table only. We are seeing this in
> other regions of other tables as well. Are we merging too fast?
>
>
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Polling as you described is fine.
> >
> > catalogJanitor.cleanMergeQualifier() is called by
> > DispatchMergingRegionHandler.
> >
> > If clean was successful, you would see the following:
> >
> >       LOG.debug("Deleting region " + regionA.getRegionNameAsString() + "
> > and "
> >
> >           + regionB.getRegionNameAsString()
> >
> >           + " from fs because merged region no longer holds references");
> >
> > Assuming there was no log below in your master log:
> >
> >       LOG.error("Merged region " + region.getRegionNameAsString()
> >
> >           + " has only one merge qualifier in META.");
> >
> > It would be the case that 7373f75181c71eb5061a6673cee15931 still had
> > reference file.
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <shahab.yunus@gmail.com>
> > wrote:
> >
> > > Hi Ted.
> > >
> > > The log bit is below at the end of the email. This is the command to
> > merge
> > > that I gave just now through hbase shell. forcible was false but it
> > behaves
> > > similarly if forcible is true too. This is from master log. Indeed the
> > > region merging was skipped! What does this mean? Data seems to be
> intact
> > > for this table.
> > >
> > > Just to give you a background. This table was first merge by the auto
> > mated
> > > java application. What we are doing is that we are merging tables
> > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll
> > for
> > > the number of regions getting lowered after this merge call. The
> > > application hangs and continues polling for ever as the previous merge
> > > didn't happen.
> > >
> > > In this poll loop, we do get the number of regions by a fresh call to
> > > HBaseAdmin.getTableRegions(tableName).getSize().
> > >
> > > What are these merge qualifiers and what are we doing wrong or should
> do?
> > >
> > > In the polling loop we can somehow retry merge again? But how can we
> > know,
> > > that we need to call merge again as it works for some regions. Is the
> > table
> > > meta corrupted for some reason by the above logic?
> > >
> > > Thanks a lot.
> > >
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236b closed
> > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > > client connection,
> > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > baseZNode=/hbase
> > > 2014-11-14 11:25:02,645 INFO
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > ZooKeeper
> > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> > not
> > > attempt to authenticate using SASL (unknown error)
> > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > initiating
> > > session
> > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> > > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236c closed
> > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > > 2014-11-14 11:25:30,713 INFO
> > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> Skip
> > > merging regions
> > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > > client connection,
> > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > baseZNode=/hbase
> > > 2014-11-14 11:25:41,384 INFO
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > ZooKeeper
> > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> > not
> > > attempt to authenticate using SASL (unknown error)
> > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > initiating
> > > session
> > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> > > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236e closed
> > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Looking at DispatchMergingRegionHandler, it does some check before
> > > > initiating the merge.
> > > > e.g.:
> > > >
> > > >       LOG.info("Skip merging regions " +
> > region_a.getRegionNameAsString()
> > > >
> > > >           + ", " + region_b.getRegionNameAsString() + ", because
> > region "
> > > >
> > > >           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> > > > region_b
> > > >
> > > >               .getEncodedName()) + " has merge qualifier");
> > > >
> > > > Can you take a look at master log around the time merge request was
> > > issued
> > > > to see if you can get some clue ?
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > The documentation of online merge tool (merge_region) states that
> if
> > we
> > > > > forcibly merge regions (by setting the 3rd attribute as true) then
> it
> > > can
> > > > > create overlapping regions. if this happens then will this render
> the
> > > > > region or table unusable or it is just a performance hit? I mean
> how
> > > > bigger
> > > > > of a deal it is?
> > > > >
> > > > > Actually, we are merging regions using the programmatic API for
> this
> > > and
> > > > > setting this flag ('forcible') as false. But for some tables (we
> > > haven't
> > > > > figured out a pattern yet, data is still accessible), merge of
> > regions
> > > do
> > > > > not happen at all. Afterwards we tried with this flag = true, and
> it
> > > > still
> > > > > doesn't merge them.
> > > > >
> > > > > CDH 5.1.0
> > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > >
> > >
> >
>

--001a113ef442fb9b600507d4bb5d--