hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added
Date Wed, 27 Sep 2017 15:18:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182720#comment-16182720
] 

Sunil G commented on YARN-7244:
-------------------------------

Thanks [~jlowe] for adding more clarity on this.

'pull' model may be better and could work for all such cases. As Jason suggested if apps could
know the latest dirs from {{getLocalDirsForRead/Write}}, shuffle handler will have a list
of valid dirs always. Only potential issue which I see is that, once a set of dirs are pulled
from {{LocalDirAllocator#ctx.localDirs}}, these dirs will be validated only when one more
getLocalPathForWrite/Read is invoked. So there could be a window where we may get a stale
dirs. If new api {{LocalDirAllocator#getLocalDirsForRead}} could call {{confChanged}}, then
i think it should be a source of truth for localDirs for given time snapshot.

bq.Do you think, we can improve this to skip as default behavior itself
Currently in this patch, you are trying to avoid disk validation check when shouldFilter is
false. To add more context, may be we could skip this check here provided we have a valid
dirs in ShuffleHandler end based on earlier api.

> ShuffleHandler is not aware of disks that are added
> ---------------------------------------------------
>
>                 Key: YARN-7244
>                 URL: https://issues.apache.org/jira/browse/YARN-7244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM startup. If disks
later are added to the node then map tasks will start using them but the ShuffleHandler will
not be aware of them. The end result is that the data cannot be shuffled from the node leading
to fetch failures and re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message