hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andras Bokor (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-2829) JT should consider the disk each task is on before scheduling jobs...
Date Fri, 23 Feb 2018 13:19:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andras Bokor resolved HADOOP-2829.
    Resolution: Invalid

It seems obsolete.

> JT should consider the disk each task is on before scheduling jobs...
> ---------------------------------------------------------------------
>                 Key: HADOOP-2829
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2829
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>            Priority: Major
> The DataNode can support a JBOD config, where blocks exist on explicit disks.  But this
information is not exported or considered by the JT when assigning tasks.  This leads to non-optimal
disk use.  if 4 slots are used, 2 running tasks will likely be on the same disk and we observe
them running more slowly then other tasks on the same machine.
> We could follow a number of strategies to address this.
> for example: The data nodes could support a what disk is this block on call.  Then the
JT could discover the info and assign jobs accordingly.
> Of course the TT itself uses disks for merge and temp space and the datanodes on the
same machine can be used by off node sources, so it is not clear optimizing all of this is
simple enough to be worth it.
> This issue deserves study.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message