hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huangkaixuan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6289) Fail to achieve data locality when runing MapReduce and Spark on HDFS
Date Tue, 07 Mar 2017 03:41:32 GMT

     [ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Huangkaixuan updated YARN-6289:
    Issue Type: Bug  (was: Improvement)

> Fail to achieve data locality when runing MapReduce and Spark on HDFS
> ---------------------------------------------------------------------
>                 Key: YARN-6289
>                 URL: https://issues.apache.org/jira/browse/YARN-6289
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>         Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread 
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2	Hadoop-2.7.1 
>            Reporter: Huangkaixuan
>         Attachments: YARN-DataLocality.docx
> When I ran experiments with both Spark and MapReduce wordcount on YARN, I noticed that
the task failed to achieve data locality, even though there is no other job running on the
> I adopted a 7 node (1 master, 6 data nodes/node managers) cluster and set 2x replication
for HDFS. In the experiments, I run Spark/MapReduce wordcount on YARN for 10 times with a
single data block. The results show that only 30% of tasks can achieve data locality, it seems
like random in the placement of tasks. the experiment details are in the attachment, you can
reproduce the experiments.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message