hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
Date Tue, 22 May 2012 05:09:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280731#comment-13280731
] 

chunhui shen commented on HBASE-6065:
-------------------------------------

Suppose region A on the regionserver B,
The issue could reproduce as the following step:

1.put one data to region A (append seq 1 in the hlog)
2.put one data to region A (append seq 2 in the hlog)
3.region A start flush,  it will call HLog#startCacheFlush (current seq num is 3 in the hlog)
4.put one data to region A (append seq 4 in the hlog)
5.region A complete flush, it will call HLog#completeCacheFlush  (append seq 3 in the hlog)
6.kill regionserver B.

So, the hlog file has four edit:
seq 1
seq 2
seq 4
seq 3

when splitting this hlog file, we generate the recoverd.edits file for region A which is named
3.(About the name, we could see HLogSplitter#splitLogFileToTemp)

Now, when replaying recoverd.edits file for region A, we will skip this file and cause data
loss.




                
> Log for flush would append a non-sequential edit in the hlog, may cause data loss
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-6065
>                 URL: https://issues.apache.org/jira/browse/HBASE-6065
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: HBASE-6065.patch
>
>
> After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush.
> {code}
> public void completeCacheFlush(final byte [] encodedRegionName,
>       final byte [] tableName, final long logSeqId, final boolean isMetaRegion)
> {
> ...
> HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
>             System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
> ...
> }
> {code}
> when we make the hlog key, we use the seqId from the parameter, and it is generated by
HLog#startCacheFlush,
> Here, we may append a lower seq id edit than the last edit in the hlog file.
> If it is the last edit log in the file, it may cause data loss.
> because 
> {code}
> HRegion#replayRecoveredEditsIfAny{
> ...
> maxSeqId = Math.abs(Long.parseLong(fileName));
>       if (maxSeqId <= minSeqId) {
>         String msg = "Maximum sequenceid for this log is " + maxSeqId
>             + " and minimum sequenceid for the region is " + minSeqId
>             + ", skipped the whole file, path=" + edits;
>         LOG.debug(msg);
>         continue;
>       }
> ...
> }
> {code}
> We may skip the splitted log file, because we use the lase edit's seq id as its file
name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message