hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11907) NameNodeResourceChecker should avoid calling df.getAvailable too frequently
Date Thu, 08 Jun 2017 20:11:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043335#comment-16043335
] 

Arpit Agarwal edited comment on HDFS-11907 at 6/8/17 8:10 PM:
--------------------------------------------------------------

Hi [~andrew.wang], you are right that it's expected to be a cheap call, but calling it once
per second per volume seems excessive. Do you see any benefit to querying {{df}} once per
second? We can make the caching interval configurable and leave the default at 1 second if
you prefer.

This is not the same as changing the health check interval as Chen mentioned. Keeping the
health check interval at 1 second lets us detect process failure faster and we don't want
to change that.

Also the v4 patch has a couple of issues I missed earlier. [~vagarychen] can you please take
a look at these?
# availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
# The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix
is to check all volumes instead of trying to query a specific one.


was (Author: arpitagarwal):
Hi [~andrew.wang], you are right that it's expected to be a cheap call, but calling it once
per second per volume seems excessive. Do you see any benefit to querying {{df}} once per
second? We can make the caching interval configurable and leave the default at 1 second if
you prefer.

This is not the same as changing the health check interval as Chen mentioned. Keeping the
health check interval at 1 second lets us detect process failure faster and we don't want
to change that.

Also the v4 patch has a couple of issues I missed earlier.
# availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
# The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix
is to check all volumes instead of trying to query a specific one.

> NameNodeResourceChecker should avoid calling df.getAvailable too frequently
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11907
>                 URL: https://issues.apache.org/jira/browse/HDFS-11907
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-11907.001.patch, HDFS-11907.002.patch, HDFS-11907.003.patch,
HDFS-11907.004.patch
>
>
> Currently, {{HealthMonitor#doHealthChecks}} invokes {{NameNode#monitorHealth}} which
ends up invoking {{NameNodeResourceChecker#isResourceAvailable}}, at the frequency of once
per second by default. And NameNodeResourceChecker#isResourceAvailable invokes {{df.getAvailable();}}
every time it is called.
> Since available space information should rarely be changing dramatically at the pace
of per second. A cached value should be sufficient. i.e. only try to get the updated value
when the cached value is too old. otherwise simply return the cached value. This way df.getAvailable()
gets invoked less.
> Thanks [~arpitagarwal] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message