hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Forcibly merging regions
Date Fri, 14 Nov 2014 17:58:55 GMT
I see. Thanks.

And if the region indeed has references, then can we somehow forcibly
remove them? Is this even possible (if not advisable)? Basically what I am
trying to ask is that let us say we do hit this scenario and we know it is
OK to go ahead and merge. What steps can we follow after detection of such
unwanted references.

Regards,
Shahab

On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For automated detection of such scenario, you can reference the code in
> CatalogJanitor#cleanMergeRegion():
>
>       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>
>           this.services.getConfiguration(), fs, tabledir, mergedRegion,
> true
> );
>
> ...
>
> Then regionFs.hasReferences(htd) would tell you whether the underlying
> region has reference files.
> Cheers
>
> On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <shahab.yunus@gmail.com>
> wrote:
>
> > No. Not that I can recall but I can check.
> >
> > From resolution perspective, is there any way we can resolve this. More
> > importantly, anyway we can automate the resolution, if we run into such
> > issues in future? 'Cleaning the qualifier', that is.
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> > > involved in some hbase snapshot.
> > >
> > > Was the underlying table being snapshotted in recent past ?
> > >
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <shahab.yunus@gmail.com>
> > > wrote:
> > >
> > > > Thanks again.
> > > >
> > > > But I have been polling for a while and it still doesn't merge. I
> mean
> > > this
> > > > particular region example that I sent you, I am trying to merge it
> > since
> > > > yesterday. I ran the polling-base code all night and I have to kill
> it.
> > > > Then in the morning, I tried manual merging through hbase shell and
> it
> > > > still doesn't merge. Note that the current polling logic doesnot try
> to
> > > > call merge again. It just checks the region size.
> > > >
> > > > So how to clean it then? Or actually make it merge? Plus is this
> > > something
> > > > expected (a region keeping a reference)? How can we avoid it?
> > > >
> > > > Note that this is not limited to this table only. We are seeing this
> in
> > > > other regions of other tables as well. Are we merging too fast?
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >
> > > > > Polling as you described is fine.
> > > > >
> > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > DispatchMergingRegionHandler.
> > > > >
> > > > > If clean was successful, you would see the following:
> > > > >
> > > > >       LOG.debug("Deleting region " +
> regionA.getRegionNameAsString()
> > +
> > > "
> > > > > and "
> > > > >
> > > > >           + regionB.getRegionNameAsString()
> > > > >
> > > > >           + " from fs because merged region no longer holds
> > > references");
> > > > >
> > > > > Assuming there was no log below in your master log:
> > > > >
> > > > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > > > >
> > > > >           + " has only one merge qualifier in META.");
> > > > >
> > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still
> had
> > > > > reference file.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ted.
> > > > > >
> > > > > > The log bit is below at the end of the email. This is the command
> > to
> > > > > merge
> > > > > > that I gave just now through hbase shell. forcible was false
but
> it
> > > > > behaves
> > > > > > similarly if forcible is true too. This is from master log.
> Indeed
> > > the
> > > > > > region merging was skipped! What does this mean? Data seems
to be
> > > > intact
> > > > > > for this table.
> > > > > >
> > > > > > Just to give you a background. This table was first merge by
the
> > auto
> > > > > mated
> > > > > > java application. What we are doing is that we are merging tables
> > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i async,
> we
> > > poll
> > > > > for
> > > > > > the number of regions getting lowered after this merge call.
The
> > > > > > application hangs and continues polling for ever as the previous
> > > merge
> > > > > > didn't happen.
> > > > > >
> > > > > > In this poll loop, we do get the number of regions by a fresh
> call
> > to
> > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > >
> > > > > > What are these merge qualifiers and what are we doing wrong
or
> > should
> > > > do?
> > > > > >
> > > > > > In the polling loop we can somehow retry merge again? But how
can
> > we
> > > > > know,
> > > > > > that we need to call merge again as it works for some regions.
Is
> > the
> > > > > table
> > > > > > meta corrupted for some reason by the above logic?
> > > > > >
> > > > > > Thanks a lot.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > >
> > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236b closed
> > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > > Initiating
> > > > > > client connection,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > sessionTimeout=60000
> > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > baseZNode=/hbase
> > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
to
> > > > > ZooKeeper
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > > Will
> > > > > not
> > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> > Socket
> > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > initiating
> > > > > > session
> > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> > Session
> > > > > > establishment complete on server
> > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236c closed
> > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > >
> > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > Skip
> > > > > > merging regions
> > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> qualifier
> > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > > Initiating
> > > > > > client connection,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > sessionTimeout=60000
> > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > baseZNode=/hbase
> > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
to
> > > > > ZooKeeper
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > > Will
> > > > > not
> > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> > Socket
> > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > initiating
> > > > > > session
> > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> > Session
> > > > > > establishment complete on server
> > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236e closed
> > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Looking at DispatchMergingRegionHandler, it does some check
> > before
> > > > > > > initiating the merge.
> > > > > > > e.g.:
> > > > > > >
> > > > > > >       LOG.info("Skip merging regions " +
> > > > > region_a.getRegionNameAsString()
> > > > > > >
> > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
> because
> > > > > region "
> > > > > > >
> > > > > > >           + (regionAHasMergeQualifier ?
> > region_a.getEncodedName() :
> > > > > > > region_b
> > > > > > >
> > > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > > >
> > > > > > > Can you take a look at master log around the time merge
request
> > was
> > > > > > issued
> > > > > > > to see if you can get some clue ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > shahab.yunus@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The documentation of online merge tool (merge_region)
states
> > that
> > > > if
> > > > > we
> > > > > > > > forcibly merge regions (by setting the 3rd attribute
as true)
> > > then
> > > > it
> > > > > > can
> > > > > > > > create overlapping regions. if this happens then will
this
> > render
> > > > the
> > > > > > > > region or table unusable or it is just a performance
hit? I
> > mean
> > > > how
> > > > > > > bigger
> > > > > > > > of a deal it is?
> > > > > > > >
> > > > > > > > Actually, we are merging regions using the programmatic
API
> for
> > > > this
> > > > > > and
> > > > > > > > setting this flag ('forcible') as false. But for some
tables
> > (we
> > > > > > haven't
> > > > > > > > figured out a pattern yet, data is still accessible),
merge
> of
> > > > > regions
> > > > > > do
> > > > > > > > not happen at all. Afterwards we tried with this flag
= true,
> > and
> > > > it
> > > > > > > still
> > > > > > > > doesn't merge them.
> > > > > > > >
> > > > > > > > CDH 5.1.0
> > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Shahab
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message