hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: [jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
Date Wed, 26 Jan 2011 09:25:13 GMT
In HRegion.internalFlushCache we have this logic:

    final long currentMemStoreSize = this.memstoreSize.get();
    List<StoreFlusher> storeFlushers = new
    try {
      sequenceId = (wal == null)? myseqid: wal.startCacheFlush();
      completeSequenceId = this.getCompleteCacheFlushSequenceId(sequenceId);

      for (Store s : stores.values()) {

      // prepare flush (take a snapshot)
      for (StoreFlusher flusher : storeFlushers) {
    } finally {

We take a write lock, no more puts/deletes/whatever can be done to this hregion.

we then grab a seqid (wal.startCacheFlush).  We now snapshot everything.

we then release the update lock and mutations can happen to the region again.

The flush sequence id should lie exactly between the snapshot and the memstore.

Given this code, I'm not sure how to explain what you are seeing...
But this logic seems spot on and correct.

On Wed, Jan 26, 2011 at 1:14 AM, Kannan Muthukkaruppan (JIRA)
<jira@apache.org> wrote:
>    [ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986911#action_12986911
> Kannan Muthukkaruppan commented on HBASE-3481:
> ----------------------------------------------
> Is the last seq id readily available from the memstore KVs or is already stashed away
somewhere? I agree that that would be the cleanest/best fix.
> (Happy to accept  a patch if you want to post one up. Else, I'll look further on this
 tomorrow morning).
>> max seq id in flushed file can be larger than its correct value causing data loss
during recovery
>> -------------------------------------------------------------------------------------------------
>>                 Key: HBASE-3481
>>                 URL: https://issues.apache.org/jira/browse/HBASE-3481
>>             Project: HBase
>>          Issue Type: Bug
>>            Reporter: Kannan Muthukkaruppan
>>            Priority: Critical
>> [While doing some cluster kill tests, I noticed some missing data after log recovery.
Upon investigating further, and pretty printing contents of HFiles and recovered logs, this
is my analysis of the situation/bug. Please confirm the theory and pitch in with suggestions.]
>> When memstores are flushed, the max sequence id recorded in  the HFile should be
the max sequence id of all KVs in the memstore. However, we seem to simply obtain the current
sequence id from the HRegion, and stamp the HFile's MAX_SEQ_ID with it.
>> From HRegion.java:
>> {code}
>>     sequenceId = (wal == null)? myseqid: wal.startCacheFlush();
>> {code}
>> where, startCacheFlush() is:
>> {code}
>> public long startCacheFlush() {
>>     this.cacheFlushLock.lock();
>>     return obtainSeqNum();
>>  }
>> {code}
>> where, obtainSeqNum() is simply:
>> {code}
>> private long obtainSeqNum() {
>>     return this.logSeqNum.incrementAndGet();
>>   }
>> {code}
>> So let's say a memstore contains edits with sequence number 1..10.
>> Meanwhile, say more Puts come along, and are going through this flow (in pseudo-code)
>> {code}
>> 1. HLog.append();
>>        1.1  obtainSeqNum()
>>        1.2 writeToWAL()
>> 2 updateMemStore()
>> {code}
>> So it is possible that the sequence number has already been incremented to say 15
if there are 5 more outstanding puts. Say the writeToWAL() is still in progress for these
puts. In this case, none of these edits (11..15) would have been written to memstore yet.
>> At this point if a cache flush of the memstore happens, then we'll record its MAX_SEQ_ID
as 16 in the store file instead of 10 (because that's what obtainSeqNum() would return as
the next sequence number to use, right?).
>> Assume that the edits 11..15 eventually complete. And so HLogs do contain the data
for edits 11..15.
>> Now, at this point if the region server were to crash, and we run log recovery, the
splits all go through correctly, and a correct recovered.edits file is generated with the
edits 11..15.
>> Next, when the region is opened, the HRegion notes that one of the store file says
MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it  skips replaying edits
11..15. Or in other words, data loss.
>> ----
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

View raw message