hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18786) FileNotFoundException should not be silently handled for primary region replicas
Date Tue, 10 Oct 2017 22:30:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199483#comment-16199483
] 

Gary Helmling commented on HBASE-18786:
---------------------------------------

Seems fine to remove from 1.3.  

handleFileNotFound() was introduced by HBASE-13651 to handle a situation where regionserver
A is hosting a region and starts a compaction, enters GC pause, region is reassigned, then
regionserver A emerges from pause and archives the compacted files before aborting.  If we
really want to handle this situation then we need to introduce fencing at the HDFS level during
failed server processing.  The current situation with handleFileNotFound() seems worse than
the original problem, since it can hide other problems.

> FileNotFoundException should not be silently handled for primary region replicas
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-18786
>                 URL: https://issues.apache.org/jira/browse/HBASE-18786
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: Ashu Pachauri
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0
>
>         Attachments: HBASE-18786-branch-1.3.patch, HBASE-18786-branch-1.patch, HBASE-18786-branch-1.patch,
HBASE-18786.patch, HBASE-18786.patch
>
>
> This is a follow up for HBASE-18186.
> FileNotFoundException while scanning from a primary region replica can be indicative
of a more severe problem. Handling them silently can cause many underlying issues go undetected.
We should either
> 1. Hard fail the regionserver if there is a FNFE on a primary region replica, OR
> 2. Report these exceptions as some region / server level metric so that these can be
proactively investigated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message