hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
Date Fri, 01 Aug 2014 19:05:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082767#comment-14082767

Colin Patrick McCabe commented on HDFS-6783:

Thanks for looking at this, Yi.

The behavior you're describing is intentional here.  If we call {{setNeedsRescan}} while a
rescan is going on, we don't want that scan to count as the rescan.

So the question then becomes: CAN we call {{setNeedsRescan}} while a rescan is going on? 
Right now the answer is no, since rescan holds the FSN lock, and everything that calls {{setNeedsRescan}}
also holds that lock.  But this is accidental... not something we should rely on.  The v4
patch you posted is correct, but it makes the assumption that we will always hold the FSN
lock for the entire duration of the scan.  While this is true currently, it wasn't always
true, and won't be true in the future (we *need* to reduce the length of time we hold these
locks to avoid latency spikes.)

Maybe a good compromise here would be to set the loop variables while holding both the FSN
lock and the CRM lock.  This fixes the corner case you identified, and also would continue
to work properly if we released the FSN lock during the rescan.

> Fix HDFS CacheReplicationMonitor rescan logic
> ---------------------------------------------
>                 Key: HDFS-6783
>                 URL: https://issues.apache.org/jira/browse/HDFS-6783
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: caching
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}

This message was sent by Atlassian JIRA

View raw message