hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss
Date Mon, 02 Apr 2012 18:35:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244418#comment-13244418
] 

stack commented on HBASE-5689:
------------------------------

Good one Chunhui.  I think the patch good.

Nice reproduction of the problem in a test.  Where in the test do you find that we've lost
the third edit?

So we name the file when we write it for its first edit, then when we move it into place,
we rename it to be by last edit in the file?  Add a comment to that effect I'd say else could
be confusing.  Hmm... I suppose you have it here on the doc for getCompletedRecoveredEditsFilePath.
 Thats probably good enough.. but no harm explaining why we go from naming file w/ first edit
to instead name it for the last edit.


                
> Skipping RecoveredEdits may cause data loss
> -------------------------------------------
>
>                 Key: HBASE-5689
>                 URL: https://issues.apache.org/jira/browse/HBASE-5689
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third KV(r3->v3)
is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same hlog
file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create
one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create
another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>       if (edits == null || !this.fs.exists(edits)) {
>         LOG.warn("Null or non-existent edits file: " + edits);
>         continue;
>       }
>       if (isZeroLengthThenDelete(this.fs, edits)) continue;
>       if (checkSafeToSkip) {
>         Path higher = files.higher(edits);
>         long maxSeqId = Long.MAX_VALUE;
>         if (higher != null) {
>           // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>           String fileName = higher.getName();
>           maxSeqId = Math.abs(Long.parseLong(fileName));
>         }
>         if (maxSeqId <= minSeqId) {
>           String msg = "Maximum possible sequenceid for this log is " + maxSeqId
>               + ", skipped the whole file, path=" + edits;
>           LOG.debug(msg);
>           continue;
>         } else {
>           checkSafeToSkip = false;
>         }
>       }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message