Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 7 Oct 2015 21:36:28 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12846363.1437426681000.50270.1444253788104@Atlassian.JIRA>
In-Reply-To: <JIRA.12846363.1437426681000@Atlassian.JIRA>
References: <JIRA.12846363.1437426681000@Atlassian.JIRA>
 <JIRA.12846363.1437426681396@arcas>
Subject: [jira] [Commented] (YARN-3943) Use separate threshold
 configurations for disk-full detection and disk-not-full detection.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947641#comment-14947641 ] 

Jason Lowe commented on YARN-3943:
----------------------------------

Thanks for the patch, [~zxu]!

I think this patch causes backwards compatibility problems for users who have increased the max-disk-utilization-per-disk-percentage from the default.  For example, if someone changed it to 100 and only wants disks to be treated full if they actually fill up completely then this patch will silently make those disks unusable once they fill until they free up at least 10% of their space.  That's not going to be good if the cluster runs the disks over 90% most of the time, as it will effectively take the disk out of rotation for a long, long time.  A more compatible change would be to _not_ set a value in yarn-default for this property.  If the property is set then we use it, but if we get a null for it then we know it isn't specified and we shouldn't just use the same threshold for low as for high.

Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on diskUtilizationPercentageCutoffHigh rather than 100.0?

Nit: I think using Math.max / Math.min could make the range capping more readable (certainly less redundant with the identifiers and constants), but it's not must-fix.

Nit: "Use" should be "Using" in the following log message.  "Use" implies we are giving a directive requiring the user to do it, while "Using" states the code is doing it for them.
{code}
+      if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) {
+        LOG.warn("Use " + YarnConfiguration.
+            NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " +
+            YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+            ", because " + YarnConfiguration.
+            NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+            " is not configured properly.");
{code}


> Use separate threshold configurations for disk-full detection and disk-not-full detection.
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-3943
>                 URL: https://issues.apache.org/jira/browse/YARN-3943
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and when disks become good. Currently the configuration "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are used to check both when disks become full and when disks become good. It will be better to use two configurations: one is used when disks become full from not-full and the other one is used when disks become not-full from full. So we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one for disk-not-full detection.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)