hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18786) FileNotFoundException should not be silently handled for primary region replicas
Date Tue, 10 Oct 2017 21:28:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199403#comment-16199403

Andrew Purtell commented on HBASE-18786:

What this patch removed was a silent rescan of storefiles if there is some kind of glitch
leading to a FNFE during scanning.  If there was a temporary store file accounting failure
then after one server aborts and another picks it up, the new server will not see another

If there is a permanent condition, like loss of files or directories in HDFS, leading to FNFEs
and a cascading failure situation, then I don't see how rescanning would help, and anyway
we should handle it differently. Previously we would have silently opened the region with
missing files (?). That would be bad. Aborting would be bad too in that case. Rather than
aborting we should fail the region open. This should be handled with a new JIRA.

> FileNotFoundException should not be silently handled for primary region replicas
> --------------------------------------------------------------------------------
>                 Key: HBASE-18786
>                 URL: https://issues.apache.org/jira/browse/HBASE-18786
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: Ashu Pachauri
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0
>         Attachments: HBASE-18786-branch-1.3.patch, HBASE-18786-branch-1.patch, HBASE-18786-branch-1.patch,
HBASE-18786.patch, HBASE-18786.patch
> This is a follow up for HBASE-18186.
> FileNotFoundException while scanning from a primary region replica can be indicative
of a more severe problem. Handling them silently can cause many underlying issues go undetected.
We should either
> 1. Hard fail the regionserver if there is a FNFE on a primary region replica, OR
> 2. Report these exceptions as some region / server level metric so that these can be
proactively investigated.

This message was sent by Atlassian JIRA

View raw message