hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17712) Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
Date Thu, 09 Mar 2017 05:25:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902518#comment-15902518

Duo Zhang commented on HBASE-17712:

Does the FNFE have the file name in it?
I believe so.

The AsyncFSWAL.java changes are related?

Yeah it is related. In RS.abort will wait for all regions to be closed. But for AsyncFSWAL,
we will retry forever so there is a dead lock. Although I think it is a bit strange that we
still need to confirm region closing when aborting a RS, but the check in AsyncFSWAL is no
harm so I include it in the patch. We can discuss later if we need to wait in RS.abort.


> Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
> -----------------------------------------------------------------
>                 Key: HBASE-17712
>                 URL: https://issues.apache.org/jira/browse/HBASE-17712
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-17712-branch-1.patch, HBASE-17712.patch, HBASE-17712-ut.patch,
HBASE-17712-v1.patch, HBASE-17712-v2.patch, HBASE-17712-v3.patch
> It is introduced in HBASE-13651 and the logic became much more complicated after HBASE-16304
due to a dead lock issue. It is really tough as sequence id is involved in and the method
we called is used to serve secondary replica originally which does not handle write.
> In fact, in 1.x release, the problem described in HBASE-13651 is gone. Now we will write
a compaction marker to WAL before deleting the compacted files. We can only consider a RS
as dead after its WAL files are all closed so if the region has already been reassigned the
compaction will fail as we can not write out the compaction marker.
> So theoretically, if we still hit FileNotFound exception, it should be a critical bug
which means we may loss data. I do not think it is a good idea to just eat the exception and
refresh store files. Or even if we want to do this, we can just refresh store files without
dropping memstore contents. This will also simplify the logic a lot.
> Suggestions are welcomed.

This message was sent by Atlassian JIRA

View raw message