hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion
Date Wed, 04 Jan 2017 02:31:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796938#comment-15796938

Duo Zhang commented on HBASE-17407:

It starts with startCacheFlush(), ends with completeCacheFlush() and it can recover from crashes
(via abortCacheFlush()) that happen between its starting and ending point.
finalizeFlush() is invoked between start and complete. In this a period the WAL state can
be inconsistent.

I do not think so. You can see the comment of abortCacheFlush, it is only useful when restarting
a regionserver. Usually we will just abort the regionserver if flush fails.

And the dangerous thing is that, we maintain an incorrect value in WAL which may cause data
loss! And in finalizeFlush we will set flushed sequence id to a smaller value which is really
confusing(The flushed data come back to memstore again?)...

Removing finalizeFlush means that we need to change the current (common) path for updating
the wal. This is something we tried to avoid in the original design.

If I do not remember wrong, in the original issue, [~anoopsamjohn] and I suggested that you
implement all the logics inside memstore and do not change the behavior of flush. But you
refused and wanted to make the feature more powerful. Then I think you need to take the response
to also change the logic of flush to better fit your new design.

For this specific issue, I think the problem is that now we may not flush all the contents
in memstore, so the sequence id record in WAL may not be the flushedSeqId. So I think a better
way is to get the flushedSeqId from memstore instead of WAL. So the startCacheFlush method
will take a Map as its parameter which contains the familyName->flushedSeqId mapping. I
think it is possible to get this information from memstore? We can record the smallest sequence
id for each segment in memstore. Of course there maybe some corner cases when the memstore
is empty or we decide to flush all contents in memstore, but I think the basic idea is simple.


> Correct update of maxFlushedSeqId in HRegion
> --------------------------------------------
>                 Key: HBASE-17407
>                 URL: https://issues.apache.org/jira/browse/HBASE-17407
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
> The attribute maxFlushedSeqId in HRegion is used to track the max sequence id in the
store files and is reported to HMaster. When flushing only part of the memstore content this
value might be incorrect and may cause data loss.

This message was sent by Atlassian JIRA

View raw message