hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
Date Fri, 22 Jul 2011 18:10:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069657#comment-13069657
] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

Thanks Ted for your comments.
{noformat}
Can we fold wal.setFlushInProgress() into wal.startCacheFlush() and wal.abortCacheFlush()
to make the code cleaner ?
{noformat}
I think we may have to reset the atomic boolean even if exception happens like in completeCacheFlush
or anywhere before it.
So only I did it with a try/finally block as per Stack's comments.

{noformat}
Actually we can check whether the current thread owns cacheFlushLock
{noformat}
I checked the link. The ReentrantLock.getOwner() api is protected.  So to check if cacheFlushLock
is acquired by the current thread we have to make cacheFlushLock as a class that extends ReentrantLock.
But if we can do this then we can avoid the Atomic Boolean.  
Correct me if am wrong.  
Please give your comments if any changes are needed.


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally
and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way
I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for
a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we
only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten
and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the
memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this
will be the log seq id of the current append. All the edits that were added in step 2 are
missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace
it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message