hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
Date Sat, 24 Mar 2012 00:23:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237337#comment-13237337
] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771>
bq.  >
bq.  >     I suggest renaming holeStart as startRow and renaming holeStop as stopRow.
bq.  >     Then you don't need the comment on 1700.

renamed to holeStartKey and holeStopKey to make it clear.  Add log message to inform user
about action.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1812
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1812>
bq.  >
bq.  >     Should include maxMerge in the log.

great suggestion.  done.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1849
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1849>
bq.  >
bq.  >     I wonder whether we should bail if there have been two IOE's, one on 1759 and
one here.

This is soft state (doesn't modifiy the file system) so I'm less adamant about hard stopping
when these conditions a reached.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1863
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1863>
bq.  >
bq.  >     'Creating' -> 'Created'

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1864
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1864>
bq.  >
bq.  >     Are newRegion and region representing the same entity ?

Good catch, changed to:

       LOG.info("Created new empty container region: " +
            newRegion + " to contain regions: " + Joiner.on(",").join(overlap));


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1872
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1872>
bq.  >
bq.  >     If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in
merging ?

hm.. it is possible to have multiple empty overlapping regions merged that do no HFile moves,
which would still count as a fix.  I've changed where the return value is added to just increment
HBaseFsck's fixes count by 1.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2159
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2159>
bq.  >
bq.  >     Should say 'unable to get regions from master' or something similar

"Fatal error: unable to get root region location. Exiting..."


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2298
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2298>
bq.  >
bq.  >     Please remove this.

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2299
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2299>
bq.  >
bq.  >     'with not' -> 'without'
bq.  >     Should also include some info on the entry.

"with no"


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2311
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2311>
bq.  >
bq.  >     Please remove this.

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2821
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2821>
bq.  >
bq.  >     Typo: maximum

k


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2705
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2705>
bq.  >
bq.  >     Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime

k


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6229
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few
new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions
after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it
is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on
0.90.x versions.  Many improvements and features added from experience.  Not much testing
live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch,
hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch,
hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and
table integrity invariant violations.  However with '-fix' it can only automatically repair
region consistency cases having to do with deployment problems.  This updated version should
be able to handle all cases (including a new orphan regiondir case).  When complete will likely
deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message