Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A93AE118CF for ; Fri, 1 Aug 2014 19:05:39 +0000 (UTC) Received: (qmail 4721 invoked by uid 500); 1 Aug 2014 19:05:39 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 4667 invoked by uid 500); 1 Aug 2014 19:05:39 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 4653 invoked by uid 99); 1 Aug 2014 19:05:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 19:05:39 +0000 Date: Fri, 1 Aug 2014 19:05:39 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082767#comment-14082767 ] Colin Patrick McCabe commented on HDFS-6783: -------------------------------------------- Thanks for looking at this, Yi. The behavior you're describing is intentional here. If we call {{setNeedsRescan}} while a rescan is going on, we don't want that scan to count as the rescan. So the question then becomes: CAN we call {{setNeedsRescan}} while a rescan is going on? Right now the answer is no, since rescan holds the FSN lock, and everything that calls {{setNeedsRescan}} also holds that lock. But this is accidental... not something we should rely on. The v4 patch you posted is correct, but it makes the assumption that we will always hold the FSN lock for the entire duration of the scan. While this is true currently, it wasn't always true, and won't be true in the future (we *need* to reduce the length of time we hold these locks to avoid latency spikes.) Maybe a good compromise here would be to set the loop variables while holding both the FSN lock and the CRM lock. This fixes the corner case you identified, and also would continue to work properly if we released the FSN lock during the rescan. > Fix HDFS CacheReplicationMonitor rescan logic > --------------------------------------------- > > Key: HDFS-6783 > URL: https://issues.apache.org/jira/browse/HDFS-6783 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching > Affects Versions: 3.0.0 > Reporter: Yi Liu > Assignee: Yi Liu > Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch > > > In monitor thread, needsRescan is set to false before real scan starts, so > for {{waitForRescanIfNeeded}} will return for the first condition: > {code} > if (!needsRescan) { > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)