hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Insight on why distcp becomes slower when adding nodemanager
Date Mon, 29 Oct 2012 19:04:34 GMT
how many times did you test it?

need to rule out aberrations.

On Oct 29, 2012, at 11:30 AM, Harsh J <harsh@cloudera.com> wrote:

> On your second low-memory NM instance, did you ensure to lower the
> yarn.nodemanager.resource.memory-mb property specifically to avoid
> swapping due to excessive resource grants? The default offered is 8 GB
> (>> 1.7 GB you have).
> 
> On Mon, Oct 29, 2012 at 8:42 PM, Alexandre Fouche
> <alexandre.fouche@cleverscale.com> wrote:
>> Hi,
>> 
>> Can someone give some insight on why a "distcp" of 600 files of a few
>> hundred bytes from s3n:// to local hdfs is taking 46s when using a
>> yarn-nodemanager EC2 instance with 16GB memory (which by the way i think is
>> jokingly long), and taking 3mn30s when adding a second yarn-nodemanager (a
>> small instance with 1.7GB memory) ?
>> I would have expected it to be a bit faster, not 5xlonger !
>> 
>> I have the same issue when i stop the small instance nodemanager and restart
>> it to join the processing after the big nodemanager instance was already
>> submitted the job.
>> 
>> I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6)
>> 
>>    #Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop
>>    hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64
>>    hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>    hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>> 
>> 
>>    #Staging 14:39:51 root@resourcemanager:hadoop-yarn:
>> HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce time hadoop distcp -overwrite
>> s3n://xxx:xxx@s3n.hadoop.cwsdev/* hdfs:///tmp/something/a
>> 
>>    12/10/29 14:40:12 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null,
>> sourcePaths=[s3n://xxx:xxx@s3n.hadoop.cwsdev/*],
>> targetPath=hdfs:/tmp/something/a}
>>    12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated.
>> Instead, use mapreduce.task.io.sort.mb
>>    12/10/29 14:40:18 WARN conf.Configuration: io.sort.factor is deprecated.
>> Instead, use mapreduce.task.io.sort.factor
>>    12/10/29 14:40:19 INFO mapreduce.JobSubmitter: number of splits:15
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.jar is deprecated.
>> Instead, use mapreduce.job.jar
>>    12/10/29 14:40:19 WARN conf.Configuration:
>> mapred.map.tasks.speculative.execution is deprecated. Instead, use
>> mapreduce.map.speculative
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.reduce.tasks is
>> deprecated. Instead, use mapreduce.job.reduces
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.value.class
>> is deprecated. Instead, use mapreduce.map.output.value.class
>>    12/10/29 14:40:19 WARN conf.Configuration: mapreduce.map.class is
>> deprecated. Instead, use mapreduce.job.map.class
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.job.name is
>> deprecated. Instead, use mapreduce.job.name
>>    12/10/29 14:40:19 WARN conf.Configuration: mapreduce.inputformat.class
>> is deprecated. Instead, use mapreduce.job.inputformat.class
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.output.dir is
>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
>>    12/10/29 14:40:19 WARN conf.Configuration: mapreduce.outputformat.class
>> is deprecated. Instead, use mapreduce.job.outputformat.class
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.map.tasks is
>> deprecated. Instead, use mapreduce.job.maps
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.key.class is
>> deprecated. Instead, use mapreduce.map.output.key.class
>>    12/10/29 14:40:19 WARN conf.Configuration: mapred.working.dir is
>> deprecated. Instead, use mapreduce.job.working.dir
>>    12/10/29 14:40:20 INFO mapred.ResourceMgrDelegate: Submitted application
>> application_1351504801306_0015 to ResourceManager at
>> resourcemanager.cwsdev.cleverscale.com/10.60.106.130:8032
>>    12/10/29 14:40:20 INFO mapreduce.Job: The url to track the job:
>> http://ip-10-60-106-130.ec2.internal:8088/proxy/application_1351504801306_0015/
>>    12/10/29 14:40:20 INFO tools.DistCp: DistCp job-id:
>> job_1351504801306_0015
>>    12/10/29 14:40:20 INFO mapreduce.Job: Running job:
>> job_1351504801306_0015
>>    12/10/29 14:40:27 INFO mapreduce.Job: Job job_1351504801306_0015 running
>> in uber mode : false
>>    12/10/29 14:40:27 INFO mapreduce.Job:  map 0% reduce 0%
>>    12/10/29 14:40:42 INFO mapreduce.Job:  map 6% reduce 0%
>>    12/10/29 14:40:43 INFO mapreduce.Job:  map 33% reduce 0%
>>    12/10/29 14:40:44 INFO mapreduce.Job:  map 40% reduce 0%
>>    12/10/29 14:40:48 INFO mapreduce.Job:  map 46% reduce 0%
>>    12/10/29 14:43:04 INFO mapreduce.Job:  map 56% reduce 0%
>>    12/10/29 14:43:05 INFO mapreduce.Job:  map 58% reduce 0%
>>    12/10/29 14:43:08 INFO mapreduce.Job:  map 62% reduce 0%
>>    12/10/29 14:43:09 INFO mapreduce.Job:  map 68% reduce 0%
>>    12/10/29 14:43:15 INFO mapreduce.Job:  map 75% reduce 0%
>>    12/10/29 14:43:16 INFO mapreduce.Job:  map 82% reduce 0%
>>    12/10/29 14:43:25 INFO mapreduce.Job:  map 85% reduce 0%
>>    12/10/29 14:43:26 INFO mapreduce.Job:  map 87% reduce 0%
>>    12/10/29 14:43:29 INFO mapreduce.Job:  map 90% reduce 0%
>>    12/10/29 14:43:35 INFO mapreduce.Job:  map 93% reduce 0%
>>    12/10/29 14:43:37 INFO mapreduce.Job:  map 96% reduce 0%
>>    12/10/29 14:43:40 INFO mapreduce.Job:  map 100% reduce 0%
>>    12/10/29 14:43:40 INFO mapreduce.Job: Job job_1351504801306_0015
>> completed successfully
>>    12/10/29 14:43:40 INFO mapreduce.Job: Counters: 35
>>        File System Counters
>>            FILE: Number of bytes read=1800
>>            FILE: Number of bytes written=1050895
>>            FILE: Number of read operations=0
>>            FILE: Number of large read operations=0
>>            FILE: Number of write operations=0
>>            HDFS: Number of bytes read=22157
>>            HDFS: Number of bytes written=101379
>>            HDFS: Number of read operations=519
>>            HDFS: Number of large read operations=0
>>            HDFS: Number of write operations=201
>>            S3N: Number of bytes read=101379
>>            S3N: Number of bytes written=0
>>            S3N: Number of read operations=0
>>            S3N: Number of large read operations=0
>>            S3N: Number of write operations=0
>>        Job Counters
>>            Launched map tasks=15
>>            Other local map tasks=15
>>            Total time spent by all maps in occupied slots (ms)=12531208
>>            Total time spent by all reduces in occupied slots (ms)=0
>>        Map-Reduce Framework
>>            Map input records=57
>>            Map output records=0
>>            Input split bytes=2010
>>            Spilled Records=0
>>            Failed Shuffles=0
>>            Merged Map outputs=0
>>            GC time elapsed (ms)=42324
>>            CPU time spent (ms)=54890
>>            Physical memory (bytes) snapshot=2923872256
>>            Virtual memory (bytes) snapshot=12526301184
>>            Total committed heap usage (bytes)=1618280448
>>        File Input Format Counters
>>            Bytes Read=20147
>>        File Output Format Counters
>>            Bytes Written=0
>>        org.apache.hadoop.tools.mapred.CopyMapper$Counter
>>            BYTESCOPIED=101379
>>            BYTESEXPECTED=101379
>>            COPY=57
>> 
>>    6.90user 0.59system 3:29.17elapsed 3%CPU (0avgtext+0avgdata
>> 819392maxresident)k
>>    0inputs+344outputs (0major+62847minor)pagefaults 0swaps
>> 
>> 
>> 
>> --
>> Alexandre Fouche
> 
> 
> 
> -- 
> Harsh J
> 


Mime
View raw message