hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Forcibly merging regions
Date Fri, 14 Nov 2014 19:08:02 GMT
Digging deeper into the code, I came across this (this is from
CatalogJanitor#cleanMergeRegion):


...

...

HFileArchiver.archiveRegion
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
fs, regionA);

HFileArchiver.archiveRegion
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
fs, regionB);

MetaEditor.deleteMergeQualifiers
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(),
mergedRegion);

return true;


Do you think it is ok if we face this issue then we forcibly archive and
clean the regions ?

Regards,
Shahab

On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <shahab.yunus@gmail.com>
wrote:

> Yesterday, I believe.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Shahab:
>> When was the last time compaction was run on this table ?
>>
>> Cheers
>>
>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <shahab.yunus@gmail.com>
>> wrote:
>>
>> > I see. Thanks.
>> >
>> > And if the region indeed has references, then can we somehow forcibly
>> > remove them? Is this even possible (if not advisable)? Basically what I
>> am
>> > trying to ask is that let us say we do hit this scenario and we know it
>> is
>> > OK to go ahead and merge. What steps can we follow after detection of
>> such
>> > unwanted references.
>> >
>> > Regards,
>> > Shahab
>> >
>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >
>> > > For automated detection of such scenario, you can reference the code
>> in
>> > > CatalogJanitor#cleanMergeRegion():
>> > >
>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>> > >
>> > >           this.services.getConfiguration(), fs, tabledir,
>> mergedRegion,
>> > > true
>> > > );
>> > >
>> > > ...
>> > >
>> > > Then regionFs.hasReferences(htd) would tell you whether the underlying
>> > > region has reference files.
>> > > Cheers
>> > >
>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <shahab.yunus@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > No. Not that I can recall but I can check.
>> > > >
>> > > > From resolution perspective, is there any way we can resolve this.
>> More
>> > > > importantly, anyway we can automate the resolution, if we run into
>> such
>> > > > issues in future? 'Cleaning the qualifier', that is.
>> > > >
>> > > > Regards,
>> > > > Shahab
>> > > >
>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yuzhihong@gmail.com>
>> wrote:
>> > > >
>> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931
>> was
>> > > > > involved in some hbase snapshot.
>> > > > >
>> > > > > Was the underlying table being snapshotted in recent past ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
>> > shahab.yunus@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks again.
>> > > > > >
>> > > > > > But I have been polling for a while and it still doesn't
merge.
>> I
>> > > mean
>> > > > > this
>> > > > > > particular region example that I sent you, I am trying to
merge
>> it
>> > > > since
>> > > > > > yesterday. I ran the polling-base code all night and I have
to
>> kill
>> > > it.
>> > > > > > Then in the morning, I tried manual merging through hbase
shell
>> and
>> > > it
>> > > > > > still doesn't merge. Note that the current polling logic
doesnot
>> > try
>> > > to
>> > > > > > call merge again. It just checks the region size.
>> > > > > >
>> > > > > > So how to clean it then? Or actually make it merge? Plus
is this
>> > > > > something
>> > > > > > expected (a region keeping a reference)? How can we avoid
it?
>> > > > > >
>> > > > > > Note that this is not limited to this table only. We are
seeing
>> > this
>> > > in
>> > > > > > other regions of other tables as well. Are we merging too
fast?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Regards,
>> > > > > > Shahab
>> > > > > >
>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yuzhihong@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > Polling as you described is fine.
>> > > > > > >
>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
>> > > > > > > DispatchMergingRegionHandler.
>> > > > > > >
>> > > > > > > If clean was successful, you would see the following:
>> > > > > > >
>> > > > > > >       LOG.debug("Deleting region " +
>> > > regionA.getRegionNameAsString()
>> > > > +
>> > > > > "
>> > > > > > > and "
>> > > > > > >
>> > > > > > >           + regionB.getRegionNameAsString()
>> > > > > > >
>> > > > > > >           + " from fs because merged region no longer
holds
>> > > > > references");
>> > > > > > >
>> > > > > > > Assuming there was no log below in your master log:
>> > > > > > >
>> > > > > > >       LOG.error("Merged region " +
>> region.getRegionNameAsString()
>> > > > > > >
>> > > > > > >           + " has only one merge qualifier in META.");
>> > > > > > >
>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
>> still
>> > > had
>> > > > > > > reference file.
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
>> > > > shahab.yunus@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Ted.
>> > > > > > > >
>> > > > > > > > The log bit is below at the end of the email.
This is the
>> > command
>> > > > to
>> > > > > > > merge
>> > > > > > > > that I gave just now through hbase shell. forcible
was false
>> > but
>> > > it
>> > > > > > > behaves
>> > > > > > > > similarly if forcible is true too. This is from
master log.
>> > > Indeed
>> > > > > the
>> > > > > > > > region merging was skipped! What does this mean?
Data seems
>> to
>> > be
>> > > > > > intact
>> > > > > > > > for this table.
>> > > > > > > >
>> > > > > > > > Just to give you a background. This table was
first merge by
>> > the
>> > > > auto
>> > > > > > > mated
>> > > > > > > > java application. What we are doing is that we
are merging
>> > tables
>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions
calls i
>> async,
>> > > we
>> > > > > poll
>> > > > > > > for
>> > > > > > > > the number of regions getting lowered after this
merge call.
>> > The
>> > > > > > > > application hangs and continues polling for ever
as the
>> > previous
>> > > > > merge
>> > > > > > > > didn't happen.
>> > > > > > > >
>> > > > > > > > In this poll loop, we do get the number of regions
by a
>> fresh
>> > > call
>> > > > to
>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
>> > > > > > > >
>> > > > > > > > What are these merge qualifiers and what are we
doing wrong
>> or
>> > > > should
>> > > > > > do?
>> > > > > > > >
>> > > > > > > > In the polling loop we can somehow retry merge
again? But
>> how
>> > can
>> > > > we
>> > > > > > > know,
>> > > > > > > > that we need to call merge again as it works for
some
>> regions.
>> > Is
>> > > > the
>> > > > > > > table
>> > > > > > > > meta corrupted for some reason by the above logic?
>> > > > > > > >
>> > > > > > > > Thanks a lot.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > ------------------------------------------------------------------------
>> > > > > > > >
>> > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236b closed
>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
>> > > > > Initiating
>> > > > > > > > client connection,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > sessionTimeout=60000
>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > > > > > baseZNode=/hbase
>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> Process
>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>> connecting
>> > to
>> > > > > > > ZooKeeper
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Opening
>> > > > > > > > socket connection to server
>> > ip-1010018.ec2.internal/1010019:2181.
>> > > > > Will
>> > > > > > > not
>> > > > > > > > attempt to authenticate using SASL (unknown error)
>> > > > > > > > 2014-11-14 11:25:02,646 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Socket
>> > > > > > > > connection established to
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > initiating
>> > > > > > > > session
>> > > > > > > > 2014-11-14 11:25:02,648 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Session
>> > > > > > > > establishment complete on server
>> > > > > ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout
= 60000
>> > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236c closed
>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > > 2014-11-14 11:25:30,713 INFO
>> > > > > > > >
>> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
>> > > > > > Skip
>> > > > > > > > merging regions
>> > > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931
has merge
>> > > qualifier
>> > > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
>> > > > > Initiating
>> > > > > > > > client connection,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > sessionTimeout=60000
>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > > > > > baseZNode=/hbase
>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> Process
>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>> connecting
>> > to
>> > > > > > > ZooKeeper
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Opening
>> > > > > > > > socket connection to server
>> > ip-1010018.ec2.internal/1010019:2181.
>> > > > > Will
>> > > > > > > not
>> > > > > > > > attempt to authenticate using SASL (unknown error)
>> > > > > > > > 2014-11-14 11:25:41,386 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Socket
>> > > > > > > > connection established to
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > initiating
>> > > > > > > > session
>> > > > > > > > 2014-11-14 11:25:41,389 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Session
>> > > > > > > > establishment complete on server
>> > > > > ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout
= 60000
>> > > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236e closed
>> > > > > > > > 2014-11-14 11:25:41,398 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ------------------------------------------------------------------------------------------------------------------------------------
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Shahab
>> > > > > > > >
>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
>> yuzhihong@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Looking at DispatchMergingRegionHandler,
it does some
>> check
>> > > > before
>> > > > > > > > > initiating the merge.
>> > > > > > > > > e.g.:
>> > > > > > > > >
>> > > > > > > > >       LOG.info("Skip merging regions " +
>> > > > > > > region_a.getRegionNameAsString()
>> > > > > > > > >
>> > > > > > > > >           + ", " + region_b.getRegionNameAsString()
+ ",
>> > > because
>> > > > > > > region "
>> > > > > > > > >
>> > > > > > > > >           + (regionAHasMergeQualifier ?
>> > > > region_a.getEncodedName() :
>> > > > > > > > > region_b
>> > > > > > > > >
>> > > > > > > > >               .getEncodedName()) + " has
merge
>> qualifier");
>> > > > > > > > >
>> > > > > > > > > Can you take a look at master log around
the time merge
>> > request
>> > > > was
>> > > > > > > > issued
>> > > > > > > > > to see if you can get some clue ?
>> > > > > > > > >
>> > > > > > > > > Cheers
>> > > > > > > > >
>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus
<
>> > > > > > shahab.yunus@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > The documentation of online merge tool
(merge_region)
>> > states
>> > > > that
>> > > > > > if
>> > > > > > > we
>> > > > > > > > > > forcibly merge regions (by setting the
3rd attribute as
>> > true)
>> > > > > then
>> > > > > > it
>> > > > > > > > can
>> > > > > > > > > > create overlapping regions. if this
happens then will
>> this
>> > > > render
>> > > > > > the
>> > > > > > > > > > region or table unusable or it is just
a performance
>> hit? I
>> > > > mean
>> > > > > > how
>> > > > > > > > > bigger
>> > > > > > > > > > of a deal it is?
>> > > > > > > > > >
>> > > > > > > > > > Actually, we are merging regions using
the programmatic
>> API
>> > > for
>> > > > > > this
>> > > > > > > > and
>> > > > > > > > > > setting this flag ('forcible') as false.
But for some
>> > tables
>> > > > (we
>> > > > > > > > haven't
>> > > > > > > > > > figured out a pattern yet, data is still
accessible),
>> merge
>> > > of
>> > > > > > > regions
>> > > > > > > > do
>> > > > > > > > > > not happen at all. Afterwards we tried
with this flag =
>> > true,
>> > > > and
>> > > > > > it
>> > > > > > > > > still
>> > > > > > > > > > doesn't merge them.
>> > > > > > > > > >
>> > > > > > > > > > CDH 5.1.0
>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
>> > > > > > > > > >
>> > > > > > > > > > Regards,
>> > > > > > > > > > Shahab
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message