Hi,
Can someone give some insight on why a "distcp" of 600 files of a few hundred bytes from s3n://
to local hdfs is taking 46s when using a yarn-nodemanager EC2 instance with 16GB memory (which
by the way i think is jokingly long), and taking 3mn30s when adding a second yarn-nodemanager
(a small instance with 1.7GB memory) ?
I would have expected it to be a bit faster, not 5xlonger !
I have the same issue when i stop the small instance nodemanager and restart it to join the
processing after the big nodemanager instance was already submitted the job.
I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6)
#Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop
hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64
hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
#Staging 14:39:51 root@resourcemanager:hadoop-yarn: HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
time hadoop distcp -overwrite s3n://xxx:xxx@s3n.hadoop.cwsdev/* hdfs:///tmp/something/a
12/10/29 14:40:12 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false,
syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[s3n://xxx:xxx@s3n.hadoop.cwsdev/*],
targetPath=hdfs:/tmp/something/a}
12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
12/10/29 14:40:18 WARN conf.Configuration: io.sort.factor is deprecated. Instead, use
mapreduce.task.io.sort.factor
12/10/29 14:40:19 INFO mapreduce.JobSubmitter: number of splits:15
12/10/29 14:40:19 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/10/29 14:40:19 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated.
Instead, use mapreduce.map.speculative
12/10/29 14:40:19 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead,
use mapreduce.job.reduces
12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated.
Instead, use mapreduce.map.output.value.class
12/10/29 14:40:19 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead,
use mapreduce.job.map.class
12/10/29 14:40:19 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use
mapreduce.job.name
12/10/29 14:40:19 WARN conf.Configuration: mapreduce.inputformat.class is deprecated.
Instead, use mapreduce.job.inputformat.class
12/10/29 14:40:19 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use
mapreduce.output.fileoutputformat.outputdir
12/10/29 14:40:19 WARN conf.Configuration: mapreduce.outputformat.class is deprecated.
Instead, use mapreduce.job.outputformat.class
12/10/29 14:40:19 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use
mapreduce.job.maps
12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead,
use mapreduce.map.output.key.class
12/10/29 14:40:19 WARN conf.Configuration: mapred.working.dir is deprecated. Instead,
use mapreduce.job.working.dir
12/10/29 14:40:20 INFO mapred.ResourceMgrDelegate: Submitted application application_1351504801306_0015
to ResourceManager at resourcemanager.cwsdev.cleverscale.com/10.60.106.130:8032
12/10/29 14:40:20 INFO mapreduce.Job: The url to track the job: http://ip-10-60-106-130.ec2.internal:8088/proxy/application_1351504801306_0015/
12/10/29 14:40:20 INFO tools.DistCp: DistCp job-id: job_1351504801306_0015
12/10/29 14:40:20 INFO mapreduce.Job: Running job: job_1351504801306_0015
12/10/29 14:40:27 INFO mapreduce.Job: Job job_1351504801306_0015 running in uber mode
: false
12/10/29 14:40:27 INFO mapreduce.Job: map 0% reduce 0%
12/10/29 14:40:42 INFO mapreduce.Job: map 6% reduce 0%
12/10/29 14:40:43 INFO mapreduce.Job: map 33% reduce 0%
12/10/29 14:40:44 INFO mapreduce.Job: map 40% reduce 0%
12/10/29 14:40:48 INFO mapreduce.Job: map 46% reduce 0%
12/10/29 14:43:04 INFO mapreduce.Job: map 56% reduce 0%
12/10/29 14:43:05 INFO mapreduce.Job: map 58% reduce 0%
12/10/29 14:43:08 INFO mapreduce.Job: map 62% reduce 0%
12/10/29 14:43:09 INFO mapreduce.Job: map 68% reduce 0%
12/10/29 14:43:15 INFO mapreduce.Job: map 75% reduce 0%
12/10/29 14:43:16 INFO mapreduce.Job: map 82% reduce 0%
12/10/29 14:43:25 INFO mapreduce.Job: map 85% reduce 0%
12/10/29 14:43:26 INFO mapreduce.Job: map 87% reduce 0%
12/10/29 14:43:29 INFO mapreduce.Job: map 90% reduce 0%
12/10/29 14:43:35 INFO mapreduce.Job: map 93% reduce 0%
12/10/29 14:43:37 INFO mapreduce.Job: map 96% reduce 0%
12/10/29 14:43:40 INFO mapreduce.Job: map 100% reduce 0%
12/10/29 14:43:40 INFO mapreduce.Job: Job job_1351504801306_0015 completed successfully
12/10/29 14:43:40 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=1800
FILE: Number of bytes written=1050895
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=22157
HDFS: Number of bytes written=101379
HDFS: Number of read operations=519
HDFS: Number of large read operations=0
HDFS: Number of write operations=201
S3N: Number of bytes read=101379
S3N: Number of bytes written=0
S3N: Number of read operations=0
S3N: Number of large read operations=0
S3N: Number of write operations=0
Job Counters
Launched map tasks=15
Other local map tasks=15
Total time spent by all maps in occupied slots (ms)=12531208
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=57
Map output records=0
Input split bytes=2010
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=42324
CPU time spent (ms)=54890
Physical memory (bytes) snapshot=2923872256
Virtual memory (bytes) snapshot=12526301184
Total committed heap usage (bytes)=1618280448
File Input Format Counters
Bytes Read=20147
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=101379
BYTESEXPECTED=101379
COPY=57
6.90user 0.59system 3:29.17elapsed 3%CPU (0avgtext+0avgdata 819392maxresident)k
0inputs+344outputs (0major+62847minor)pagefaults 0swaps
--
Alexandre Fouche
|