hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huangkaixuan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6289) yarn got little data locality
Date Mon, 06 Mar 2017 08:20:32 GMT

     [ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Huangkaixuan updated YARN-6289:
-------------------------------
    Description: When I ran this experiment with both Spark and MapReduce wordcount with yarn
on the file, I noticed that the job did not get data locality every time. It was seemingly
random in the placement of the tasks, even though there is no other job running on the cluster.
I expected the task placement to always be on the single machine which is holding the data
block, but that did not happen.  (was: When I ran this experiment with both Spark and MapReduce
wordcount on the file, I noticed that the job did not get data locality every time. It was
seemingly random in the placement of the tasks, even though there is no other job running
on the cluster. I expected the task placement to always be on the single machine which is
holding the data block, but that did not happen.)

> yarn got little data locality
> -----------------------------
>
>                 Key: YARN-6289
>                 URL: https://issues.apache.org/jira/browse/YARN-6289
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler
>         Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread 
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2	Hadoop-2.7.1 
>            Reporter: Huangkaixuan
>            Priority: Minor
>         Attachments: YARN-6289.01.docx
>
>
> When I ran this experiment with both Spark and MapReduce wordcount with yarn on the file,
I noticed that the job did not get data locality every time. It was seemingly random in the
placement of the tasks, even though there is no other job running on the cluster. I expected
the task placement to always be on the single machine which is holding the data block, but
that did not happen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message