hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huangkaixuan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-6289) Fail to achieve data locality when runing MapReduce and Spark on HDFS
Date Tue, 07 Mar 2017 03:33:32 GMT

    [ https://issues.apache.org/jira/browse/YARN-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898619#comment-15898619

Huangkaixuan edited comment on YARN-6289 at 3/7/17 3:33 AM:

The detailed environment and results of the experiments are shown in the attachment.

was (Author: huangkx6810):
The detail results of the experiments are shown in the patch

> Fail to achieve data locality when runing MapReduce and Spark on HDFS
> ---------------------------------------------------------------------
>                 Key: YARN-6289
>                 URL: https://issues.apache.org/jira/browse/YARN-6289
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler
>         Environment: Hardware configuration
> CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread 
> Memory: 128GB Memory (16x8GB) 1600MHz
> Disk: 600GBx2 3.5-inch with RAID-1
> Network bandwidth: 968Mb/s
> Software configuration
> Spark-1.6.2	Hadoop-2.7.1 
>            Reporter: Huangkaixuan
>            Priority: Minor
>         Attachments: YARN-6289.01.docx
>      When I ran experiments with both Spark and MapReduce wordcount with yarn on the
file, I noticed that the job did not get data locality every time. It was seemingly random
in the placement of the tasks, even though there is no other job running on the cluster. I
expected the task placement to always be on the single machine which is holding the data block,
but that did not happen.
>      I run the experiments with a 7 node cluster with 2x replication(1 master, 6 data
nodes/node managers) , the experiment details are in the patch so you can recreate the result.
>      In the experiments, I run Spark/MapReduce wordcount with yarn for 10 times in a
single block and the results show that only 30% of tasks can satisfy data locality, it seems
like random in the placement of tasks.  
>      Next,I will run two more experiments(7 node cluster with 2x replication with 2 blocks
and 4 blocks) to verify the results and plan to do some optimization work (optimize the schedule
policy) to improve data locality

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message