Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 1 Aug 2014 19:05:39 +0000 (UTC)
From: "Colin Patrick McCabe (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12730748.1406715025378.97369.1406919939083@arcas>
In-Reply-To: <JIRA.12730748.1406715025378@arcas>
References: <JIRA.12730748.1406715025378@arcas>
Subject: [jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor
 rescan logic
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082767#comment-14082767 ] 

Colin Patrick McCabe commented on HDFS-6783:
--------------------------------------------

Thanks for looking at this, Yi.

The behavior you're describing is intentional here.  If we call {{setNeedsRescan}} while a rescan is going on, we don't want that scan to count as the rescan.

So the question then becomes: CAN we call {{setNeedsRescan}} while a rescan is going on?  Right now the answer is no, since rescan holds the FSN lock, and everything that calls {{setNeedsRescan}} also holds that lock.  But this is accidental... not something we should rely on.  The v4 patch you posted is correct, but it makes the assumption that we will always hold the FSN lock for the entire duration of the scan.  While this is true currently, it wasn't always true, and won't be true in the future (we *need* to reduce the length of time we hold these locks to avoid latency spikes.)

Maybe a good compromise here would be to set the loop variables while holding both the FSN lock and the CRM lock.  This fixes the corner case you identified, and also would continue to work properly if we released the FSN lock during the rescan.

> Fix HDFS CacheReplicationMonitor rescan logic
> ---------------------------------------------
>
>                 Key: HDFS-6783
>                 URL: https://issues.apache.org/jira/browse/HDFS-6783
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: caching
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch
>
>
> In monitor thread, needsRescan is set to false before real scan starts, so 
> for {{waitForRescanIfNeeded}} will return for the first condition:
> {code}
> if (!needsRescan) {
>   return;
> }
> {code}


--
This message was sent by Atlassian JIRA
(v6.2#6252)