hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Forcibly merging regions
Date Fri, 14 Nov 2014 18:07:20 GMT
Shahab:
When was the last time compaction was run on this table ?

Cheers

On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <shahab.yunus@gmail.com>
wrote:

> I see. Thanks.
>
> And if the region indeed has references, then can we somehow forcibly
> remove them? Is this even possible (if not advisable)? Basically what I am
> trying to ask is that let us say we do hit this scenario and we know it is
> OK to go ahead and merge. What steps can we follow after detection of such
> unwanted references.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > For automated detection of such scenario, you can reference the code in
> > CatalogJanitor#cleanMergeRegion():
> >
> >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> >
> >           this.services.getConfiguration(), fs, tabledir, mergedRegion,
> > true
> > );
> >
> > ...
> >
> > Then regionFs.hasReferences(htd) would tell you whether the underlying
> > region has reference files.
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <shahab.yunus@gmail.com>
> > wrote:
> >
> > > No. Not that I can recall but I can check.
> > >
> > > From resolution perspective, is there any way we can resolve this. More
> > > importantly, anyway we can automate the resolution, if we run into such
> > > issues in future? 'Cleaning the qualifier', that is.
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> > > > involved in some hbase snapshot.
> > > >
> > > > Was the underlying table being snapshotted in recent past ?
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks again.
> > > > >
> > > > > But I have been polling for a while and it still doesn't merge. I
> > mean
> > > > this
> > > > > particular region example that I sent you, I am trying to merge it
> > > since
> > > > > yesterday. I ran the polling-base code all night and I have to kill
> > it.
> > > > > Then in the morning, I tried manual merging through hbase shell and
> > it
> > > > > still doesn't merge. Note that the current polling logic doesnot
> try
> > to
> > > > > call merge again. It just checks the region size.
> > > > >
> > > > > So how to clean it then? Or actually make it merge? Plus is this
> > > > something
> > > > > expected (a region keeping a reference)? How can we avoid it?
> > > > >
> > > > > Note that this is not limited to this table only. We are seeing
> this
> > in
> > > > > other regions of other tables as well. Are we merging too fast?
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > >
> > > > > > Polling as you described is fine.
> > > > > >
> > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > > DispatchMergingRegionHandler.
> > > > > >
> > > > > > If clean was successful, you would see the following:
> > > > > >
> > > > > >       LOG.debug("Deleting region " +
> > regionA.getRegionNameAsString()
> > > +
> > > > "
> > > > > > and "
> > > > > >
> > > > > >           + regionB.getRegionNameAsString()
> > > > > >
> > > > > >           + " from fs because merged region no longer holds
> > > > references");
> > > > > >
> > > > > > Assuming there was no log below in your master log:
> > > > > >
> > > > > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > > > > >
> > > > > >           + " has only one merge qualifier in META.");
> > > > > >
> > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still
> > had
> > > > > > reference file.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ted.
> > > > > > >
> > > > > > > The log bit is below at the end of the email. This is the
> command
> > > to
> > > > > > merge
> > > > > > > that I gave just now through hbase shell. forcible was
false
> but
> > it
> > > > > > behaves
> > > > > > > similarly if forcible is true too. This is from master
log.
> > Indeed
> > > > the
> > > > > > > region merging was skipped! What does this mean? Data seems
to
> be
> > > > > intact
> > > > > > > for this table.
> > > > > > >
> > > > > > > Just to give you a background. This table was first merge
by
> the
> > > auto
> > > > > > mated
> > > > > > > java application. What we are doing is that we are merging
> tables
> > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls
i async,
> > we
> > > > poll
> > > > > > for
> > > > > > > the number of regions getting lowered after this merge
call.
> The
> > > > > > > application hangs and continues polling for ever as the
> previous
> > > > merge
> > > > > > > didn't happen.
> > > > > > >
> > > > > > > In this poll loop, we do get the number of regions by a
fresh
> > call
> > > to
> > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > > >
> > > > > > > What are these merge qualifiers and what are we doing wrong
or
> > > should
> > > > > do?
> > > > > > >
> > > > > > > In the polling loop we can somehow retry merge again? But
how
> can
> > > we
> > > > > > know,
> > > > > > > that we need to call merge again as it works for some regions.
> Is
> > > the
> > > > > > table
> > > > > > > meta corrupted for some reason by the above logic?
> > > > > > >
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > >
> > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236b closed
> > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > > > Initiating
> > > > > > > client connection,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > sessionTimeout=60000
> > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > baseZNode=/hbase
> > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Process
> > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
> to
> > > > > > ZooKeeper
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> > > Opening
> > > > > > > socket connection to server
> ip-1010018.ec2.internal/1010019:2181.
> > > > Will
> > > > > > not
> > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> > > Socket
> > > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > > initiating
> > > > > > > session
> > > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> > > Session
> > > > > > > establishment complete on server
> > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236c closed
> > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > > >
> > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > > Skip
> > > > > > > merging regions
> > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> > qualifier
> > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > > > Initiating
> > > > > > > client connection,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > sessionTimeout=60000
> > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > baseZNode=/hbase
> > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Process
> > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
> to
> > > > > > ZooKeeper
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> > > Opening
> > > > > > > socket connection to server
> ip-1010018.ec2.internal/1010019:2181.
> > > > Will
> > > > > > not
> > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> > > Socket
> > > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > > initiating
> > > > > > > session
> > > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> > > Session
> > > > > > > establishment complete on server
> > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236e closed
> > > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahab
> > > > > > >
> > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yuzhihong@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Looking at DispatchMergingRegionHandler, it does some
check
> > > before
> > > > > > > > initiating the merge.
> > > > > > > > e.g.:
> > > > > > > >
> > > > > > > >       LOG.info("Skip merging regions " +
> > > > > > region_a.getRegionNameAsString()
> > > > > > > >
> > > > > > > >           + ", " + region_b.getRegionNameAsString()
+ ",
> > because
> > > > > > region "
> > > > > > > >
> > > > > > > >           + (regionAHasMergeQualifier ?
> > > region_a.getEncodedName() :
> > > > > > > > region_b
> > > > > > > >
> > > > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > > > >
> > > > > > > > Can you take a look at master log around the time
merge
> request
> > > was
> > > > > > > issued
> > > > > > > > to see if you can get some clue ?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > > shahab.yunus@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The documentation of online merge tool (merge_region)
> states
> > > that
> > > > > if
> > > > > > we
> > > > > > > > > forcibly merge regions (by setting the 3rd attribute
as
> true)
> > > > then
> > > > > it
> > > > > > > can
> > > > > > > > > create overlapping regions. if this happens then
will this
> > > render
> > > > > the
> > > > > > > > > region or table unusable or it is just a performance
hit? I
> > > mean
> > > > > how
> > > > > > > > bigger
> > > > > > > > > of a deal it is?
> > > > > > > > >
> > > > > > > > > Actually, we are merging regions using the programmatic
API
> > for
> > > > > this
> > > > > > > and
> > > > > > > > > setting this flag ('forcible') as false. But
for some
> tables
> > > (we
> > > > > > > haven't
> > > > > > > > > figured out a pattern yet, data is still accessible),
merge
> > of
> > > > > > regions
> > > > > > > do
> > > > > > > > > not happen at all. Afterwards we tried with this
flag =
> true,
> > > and
> > > > > it
> > > > > > > > still
> > > > > > > > > doesn't merge them.
> > > > > > > > >
> > > > > > > > > CDH 5.1.0
> > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Shahab
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message