hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls
Date Sun, 18 Feb 2018 18:55:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rakesh R updated HDFS-13166:
----------------------------
    Description: 
Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and does the computation.
This Jira sub-task is to discuss and implement a cache mechanism which in turn reduces the
number of function calls. Also, could define a configurable refresh interval and periodically
refresh the DN cache by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.

 Following comments taken from HDFS-10285, [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
 Comment-7)
{quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport is already
a very bad method that should be avoided for anything but jmx – even then it’s a concern.
I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding
the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty
large and tagging all the storage reports is not going to be cheap.

verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem lock? Can’t
DatanodeDescriptor#chooseStorage4Block synchronize on its storageMap?

Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned earlier, this
is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world. Then
it gets another datanode report to determine the number of live nodes to decide if it should
sleep before processing the next path. The number of nodes from the prior cached view of the
world should suffice.
{quote}

  was:
Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and does the computation.
This Jira sub-task is to discuss and implement a cache mechanism which in turn reduces the
number of function calls. Also, could define a configurable refresh interval and periodically
refresh the DN cache by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.

 Following comments taken from HDFS-10285, here
 Comment-7)
{quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport is already
a very bad method that should be avoided for anything but jmx – even then it’s a concern.
I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding
the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty
large and tagging all the storage reports is not going to be cheap.

verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem lock? Can’t
DatanodeDescriptor#chooseStorage4Block synchronize on its storageMap?

Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned earlier, this
is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world. Then
it gets another datanode report to determine the number of live nodes to decide if it should
sleep before processing the next path. The number of nodes from the prior cached view of the
world should suffice.
{quote}


> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport()
calls
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13166
>                 URL: https://issues.apache.org/jira/browse/HDFS-13166
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and does the
computation. This Jira sub-task is to discuss and implement a cache mechanism which in turn
reduces the number of function calls. Also, could define a configurable refresh interval
and periodically refresh the DN cache by fetching latest {{#getLiveDatanodeStorageReport}}
on this interval.
>  Following comments taken from HDFS-10285, [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport is already
a very bad method that should be avoided for anything but jmx – even then it’s a concern.
I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding
the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty
large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem lock? Can’t
DatanodeDescriptor#chooseStorage4Block synchronize on its storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned earlier,
this is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world.
Then it gets another datanode report to determine the number of live nodes to decide if it
should sleep before processing the next path. The number of nodes from the prior cached view
of the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message