hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17712) Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
Date Wed, 01 Mar 2017 10:27:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889918#comment-15889918

Duo Zhang commented on HBASE-17712:

Want to give an illustration of what in particular is driving you crazy Duo Zhang?
In HBASE-17633, I want to update the lowestUnflushedSequenceId in internalFlushCacheAndCommit
using the memstore's minSequenceId. And then I found that we may modify the memstore content
in refreshStoreFiles which is not part of the flush processing. After reading the code related
to region replica, I found it is easy to handle as secondary replica does not handle write,
and the replay is single threaded, no race condition. But at last I found that we even call
dropMemstoreContents in doDelta! This is totally a mess.. I can not find a safe way to update
the lowestUnflushedSequenceId if the minSequenceId is changed because of we drop some contents
in memstore... What happens if there is a flush ongoing at the same time?

Do we have tests that prove the latter assertion?
I could try to add one.


> Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
> -----------------------------------------------------------------
>                 Key: HBASE-17712
>                 URL: https://issues.apache.org/jira/browse/HBASE-17712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Duo Zhang
>             Fix For: 2.0.0, 1.4.0
> It is introduced in HBASE-13651 and the logic became much more complicated after HBASE-16304
due to a dead lock issue. It is really tough as sequence id is involved in and the method
we called is used to serve secondary replica originally which does not handle write.
> In fact, in 1.x release, the problem described in HBASE-13651 is gone. Now we will write
a compaction marker to WAL before deleting the compacted files. We can only consider a RS
as dead after its WAL files are all closed so if the region has already been reassigned the
compaction will fail as we can not write out the compaction marker.
> So theoretically, if we still hit FileNotFound exception, it should be a critical bug
which means we may loss data. I do not think it is a good idea to just eat the exception and
refresh store files. Or even if we want to do this, we can just refresh store files without
dropping memstore contents. This will also simplify the logic a lot.
> Suggestions are welcomed.

This message was sent by Atlassian JIRA

View raw message