hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17712) Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
Date Sun, 05 Mar 2017 13:08:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896259#comment-15896259

Duo Zhang commented on HBASE-17712:

Yeah there could still be holes that the RS pauses after writing the compaction marker, although
it is much rarer than before. Simply adding new checks can not solve the problem-long GC can
always occurs after your check and before the actual deletion.

As we can not remove the storefiles immediately after compaction because it may still be read
by someone, it is not possible to solve it by atomic operations on HDFS. And a possible way
is to store the storefile list in meta table, and do a checkAndPut when updating it to confirm
that the region is still holding by us. This could be done in the future, but it is not a
easy work as we need to deal with region split/merge, flush, etc. So I do not think it is
the right time to do this as the problem we want to address rarely rarely happens. Maybe we
could bring this up when we want to put storefiles on a FileSystem that does not support listing?

Let me try to solve it by another way describe in HBASE-13651 - reassigning the region. It
is a little costly and slow but given its possibility, I think it is acceptable.


> Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
> -----------------------------------------------------------------
>                 Key: HBASE-17712
>                 URL: https://issues.apache.org/jira/browse/HBASE-17712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-17712.patch, HBASE-17712-ut.patch, HBASE-17712-v1.patch
> It is introduced in HBASE-13651 and the logic became much more complicated after HBASE-16304
due to a dead lock issue. It is really tough as sequence id is involved in and the method
we called is used to serve secondary replica originally which does not handle write.
> In fact, in 1.x release, the problem described in HBASE-13651 is gone. Now we will write
a compaction marker to WAL before deleting the compacted files. We can only consider a RS
as dead after its WAL files are all closed so if the region has already been reassigned the
compaction will fail as we can not write out the compaction marker.
> So theoretically, if we still hit FileNotFound exception, it should be a critical bug
which means we may loss data. I do not think it is a good idea to just eat the exception and
refresh store files. Or even if we want to do this, we can just refresh store files without
dropping memstore contents. This will also simplify the logic a lot.
> Suggestions are welcomed.

This message was sent by Atlassian JIRA

View raw message