hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Lau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16052) Improve HBaseFsck Scalability
Date Fri, 01 Jul 2016 23:54:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359848#comment-15359848

Ben Lau commented on HBASE-16052:

Okay let me know if you guys have a consensus for what other versions should be patched. 
Re: 0.98-- yes we tested the original version of this patch in 0.98.  However if we want to
patch 0.98 it probably makes more sense to apply a backport of the trunk patch than our original
0.98 patch.  The trunk patch has some improvements that make the code cleaner on trunk but
weren't that important on 0.98 originally (eg there are a lot more PathFilter classes in trunk
so adding an abstract class to avoid too much code duplication became a no brainer).  To keep
the code from diverging too much it would make sense to backport from trunk if we decide to
patch 0.98.  I'll add a release note later based on the Jira description.  Feel free to expand/shorten

> Improve HBaseFsck Scalability
> -----------------------------
>                 Key: HBASE-16052
>                 URL: https://issues.apache.org/jira/browse/HBASE-16052
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbck
>            Reporter: Ben Lau
>            Assignee: Ben Lau
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-16052-master.patch, HBASE-16052-v3-branch-1.patch, HBASE-16052-v3-master.patch
> There are some problems with HBaseFsck that make it unnecessarily slow especially for
large tables or clusters with many regions.  
> This patch tries to fix the biggest bottlenecks and also include a couple of bug fixes
for some of the race conditions caused by gathering and holding state about a live cluster
that is no longer true by the time you use that state in Fsck processing.  These race conditions
cause Fsck to crash and become unusable on large clusters with lots of region splits/merges.
> Here are some scalability/performance problems in HBaseFsck and the changes the patch
> - Unnecessary I/O and RPCs caused by fetching an array of FileStatuses and then discarding
everything but the Paths, then passing the Paths to a PathFilter, and then having the filter
look up the (previously discarded) FileStatuses of the paths again.  This is actually worse
than double I/O because the first lookup obtains a batch of FileStatuses while all the other
lookups are individual RPCs performed sequentially.
> -- Avoid this by adding a FileStatusFilter so that filtering can happen directly on FileStatuses
> -- This performance bug affects more than Fsck, but also to some extent things like snapshots,
hfile archival, etc.  I didn't have time to look too deep into other things affected and didn't
want to increase the scope of this ticket so I focus mostly on Fsck and make only a few improvements
to other codepaths.  The changes in this patch though should make it fairly easy to fix other
code paths in later jiras if we feel there are some other features strongly impacted by this
> - OfflineReferenceFileRepair is the most expensive part of Fsck (often 50% of Fsck runtime)
and the running time scales with the number of store files, yet the function is completely
> -- Make offlineReferenceFileRepair multithreaded
> - LoadHdfsRegionDirs() uses table-level concurrency, which is a big bottleneck if you
have 1 large cluster with 1 very large table that has nearly all the regions
> -- Change loadHdfsRegionDirs() to region-level parallelism instead of table-level parallelism
for operations.
> The changes benefit all clusters but are especially noticeable for large clusters with
a few very large tables.  On our version of 0.98 with the original patch we had a moderately
sized production cluster with 2 (user) tables and ~160k regions where HBaseFsck went from
taking 18 min to 5 minutes.

This message was sent by Atlassian JIRA

View raw message