hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: Question on HBaseFsck#checkRegionConsistency()
Date Fri, 27 Mar 2015 13:45:28 GMT
On Fri, Mar 27, 2015 at 12:14 AM, Stephen Jiang <syuanjiangdev@gmail.com>
wrote:

> I am sure the following logic is a bug, but I'd like to know the rational
> behind it so that I can fix it correctly.
>
> In HBaseFsck#checkRegionConsistency(), we skip some regions that are
> recently changed.  This is undesirable (at least in the situation I am
> testing).
>
> I can easily repro a problem by modifying an existing unit test -
> TestHBaseFsck#testOverlapAndOrphan ()
> - All unit test passed in 0 as the recently changed lagging time.  Default
> is 60 seconds.  I change to default value - 60 seconds.
> - then run the UT, the UT generates an orphaned HDFS region by removing
> regioninfo in the dir
> - the HBCK repair code creates a new region to repair the problem.
> - However, it was skipped in HBaseFsck#checkRegionConsistency() and hence
> the region is not assigned and added in META.
> - At the end of UT, it failed because the repair did not fix the error.
>
> {code}
> private void checkRegionConsistency(final String key, final HbckInfo hbi)
>     ...
>     boolean recentlyModified = inHdfs && hbi.getModTime() + timelag >
> System.currentTimeMillis();
>     ...
>     } else *if (recentlyModified) {*
> *      LOG.warn("Region " + descriptiveName + " was recently modified --
> skipping");*
> *      return;*
>     }
>     ...
> }
> {code}
>
> If I changed the timelag from 0 to 60 seconds (default value), run UTs in
> TestHBaseFsck.  A lot of UT fails.  I think this is a valid customer
> scenario - people usually not change default value unless they know what
> they are doing.
> (Surpriselly, I could not find any complains from google search.  Maybe
> HBASE is so reliable that we never had some particular corruption in
> production :-)
> - note: the workaround is to run hbck/repair twice; the second run would
> fix this issue - maybe our customer just always run the hbck multiple times
> before reporting issues).
>
> I have not go back to history and find why this logic was implemented in
> the first place.  Does anyone in this list knows the logic behind (should I
> simply remove it? or I need to add some information in hbi to indicate that
> we should not skip a target region)?
>
> Thanks
> Stephen
>


Much of this code was added back in the 0.90.x days 3-4 years ago when the
zookeeper based assignment manager and master was relatively new.  At the
time, we didn't have things like table locks to protect hbck from internal
operations like splits or compactions from happening while hbck was
running.  I believe skipping over recently changed regions was mainly due
to avoiding repair operations on recent splits -- which depending on where
the process was would still have hdfs in flux.

Orphaned regions was a error condition encountered later than the original
basic set of problems, and I went with an the "each pass makes it better"
approach.

I'm fairly certain in operational instructions we tell folks to run hbck
multiple times and after the "final repair run" take a best out of 3 result
due to transience in hbase in light of recent repairs or splits.

I believe these days we've improved our code base so that the use of hbck
as a repair tool is rarer for repairs.  If it is used it is often for
interrupted DDL operations, or cases where the file system had problems
under hbase.

I'm not sure what the purpse of adding more to hbi is.  Can you elaborate?

Jon.

-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message