hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvel Thirumoolan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19468) FNFE during scans and flushes
Date Wed, 13 Dec 2017 03:29:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288664#comment-16288664
] 

Thiruvel Thirumoolan commented on HBASE-19468:
----------------------------------------------

I didn't like the ref count approach to start with, but needed something simple to show the
problem and demonstrate a fix. I wanted to rework on it. I prefer Ram's approach that doesn't
touch counters directly, looks like both of us uploaded patch more or less same time and missed
his. Gimme a couple of days, if its ok, just to cross check if anything else needs consideration.

> FNFE during scans and flushes
> -----------------------------
>
>                 Key: HBASE-19468
>                 URL: https://issues.apache.org/jira/browse/HBASE-19468
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Thiruvel Thirumoolan
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
>         Attachments: HBASE-19468-poc.patch, HBASE-19468_1.4.patch
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at the same
time. This causes regionserver to throw a UnknownScannerException and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we don't have new
scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we get a FNFE.
RegionServer throws a UnknownScannerThe client retries in 1.3. With branch-1.4, the scan fails
with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during updateReaders() and
decrement it during resetScannerStack(), so discharger doesn't clean it up. Scan lease expiries
also have to be taken care of. Am I missing anything? Is there a better approach?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message