hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Jiang <syuanjiang...@gmail.com>
Subject Question on HBaseFsck#checkRegionConsistency()
Date Fri, 27 Mar 2015 07:14:03 GMT
I am sure the following logic is a bug, but I'd like to know the rational
behind it so that I can fix it correctly.

In HBaseFsck#checkRegionConsistency(), we skip some regions that are
recently changed.  This is undesirable (at least in the situation I am

I can easily repro a problem by modifying an existing unit test -
TestHBaseFsck#testOverlapAndOrphan ()
- All unit test passed in 0 as the recently changed lagging time.  Default
is 60 seconds.  I change to default value - 60 seconds.
- then run the UT, the UT generates an orphaned HDFS region by removing
regioninfo in the dir
- the HBCK repair code creates a new region to repair the problem.
- However, it was skipped in HBaseFsck#checkRegionConsistency() and hence
the region is not assigned and added in META.
- At the end of UT, it failed because the repair did not fix the error.

private void checkRegionConsistency(final String key, final HbckInfo hbi)
    boolean recentlyModified = inHdfs && hbi.getModTime() + timelag >
    } else *if (recentlyModified) {*
*      LOG.warn("Region " + descriptiveName + " was recently modified --
*      return;*

If I changed the timelag from 0 to 60 seconds (default value), run UTs in
TestHBaseFsck.  A lot of UT fails.  I think this is a valid customer
scenario - people usually not change default value unless they know what
they are doing.
(Surpriselly, I could not find any complains from google search.  Maybe
HBASE is so reliable that we never had some particular corruption in
production :-)
- note: the workaround is to run hbck/repair twice; the second run would
fix this issue - maybe our customer just always run the hbck multiple times
before reporting issues).

I have not go back to history and find why this logic was implemented in
the first place.  Does anyone in this list knows the logic behind (should I
simply remove it? or I need to add some information in hbi to indicate that
we should not skip a target region)?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message