hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject How to speed up my slaves
Date Wed, 02 Mar 2011 18:19:32 GMT
Hello experts,

I am recently testing a set of logs that I converted to avro format in
hadoop.  I am notice really really slow performance when compare to raw
logs.  The map logs showing below seems to indicate setting up JVM took the
longest time. I am wondering if there is anything I can tweak in my

2011-03-02 08:41:21,887 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-03-02 08:43:53,684 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2011-03-02 08:43:53,737 INFO org.apache.hadoop.mapred.MapTask: data
buffer = 79691776/99614720
2011-03-02 08:43:53,737 INFO org.apache.hadoop.mapred.MapTask: record
buffer = 262144/327680
2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 11272192; bufvoid = 99614720
2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 0; kvend = 262144; length = 327680
2011-03-02 08:44:05,703 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2011-03-02 08:44:05,704 INFO
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2011-03-02 08:44:05,706 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2011-03-02 08:44:08,103 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 11272192; bufend = 22544341; bufvoid = 99614720
2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 262144; kvend = 196607; length = 327680
2011-03-02 08:44:18,921 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
2011-03-02 08:44:27,029 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 08:44:27,029 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 22544341; bufend = 33816490; bufvoid = 99614720
2011-03-02 08:44:27,030 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 196607; kvend = 131070; length = 327680
2011-03-02 08:44:29,454 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2
2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 33816490; bufend = 45088639; bufvoid = 99614720
2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 131070; kvend = 65533; length = 327680
2011-03-02 08:44:43,740 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3
2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 45088639; bufend = 56360831; bufvoid = 99614720
2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 65533; kvend = 327677; length = 327680
2011-03-02 08:44:53,066 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4
2011-03-02 08:44:53,419 INFO org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2011-03-02 08:44:54,079 INFO org.apache.hadoop.mapred.MapTask: Finished spill 5
2011-03-02 08:44:54,085 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:54,091 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,093 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,093 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,094 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,095 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,096 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2011-03-02 08:44:54,096 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 476686 bytes
2011-03-02 08:44:54,519 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:54,523 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477643 bytes
2011-03-02 08:44:54,882 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:54,886 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474730 bytes
2011-03-02 08:44:55,241 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:55,245 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 478581 bytes
2011-03-02 08:44:55,647 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:55,651 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 479353 bytes
2011-03-02 08:44:56,011 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:56,015 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474061 bytes
2011-03-02 08:44:56,367 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:56,371 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 476289 bytes
2011-03-02 08:44:56,725 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:56,729 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477047 bytes
2011-03-02 08:44:57,082 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:57,086 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474144 bytes
2011-03-02 08:44:57,437 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:57,441 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 479785 bytes
2011-03-02 08:44:57,819 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:57,823 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 478787 bytes
2011-03-02 08:44:58,177 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:58,182 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474947 bytes
2011-03-02 08:44:58,531 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:58,535 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 476642 bytes
2011-03-02 08:44:58,890 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:58,894 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474380 bytes
2011-03-02 08:44:59,248 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:59,252 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 474939 bytes
2011-03-02 08:44:59,610 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:59,614 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 476284 bytes
2011-03-02 08:44:59,964 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:44:59,968 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477268 bytes
2011-03-02 08:45:00,321 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:00,325 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477738 bytes
2011-03-02 08:45:00,680 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:00,684 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477485 bytes
2011-03-02 08:45:01,038 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:01,042 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477476 bytes
2011-03-02 08:45:01,395 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:01,399 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 475864 bytes
2011-03-02 08:45:01,749 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:01,753 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 476185 bytes
2011-03-02 08:45:02,105 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:02,109 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 477883 bytes
2011-03-02 08:45:02,463 INFO org.apache.hadoop.mapred.Merger: Merging
6 sorted segments
2011-03-02 08:45:02,467 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 6 segments left of total size: 478753 bytes
2011-03-02 08:45:02,830 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201102071809_6255_m_000008_0 is done. And is in the
process of commiting
2011-03-02 08:45:02,833 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201102071809_6255_m_000008_0' done.



When I ran the job with raw logs I see

2011-03-02 10:40:56,113 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-03-02 10:41:01,514 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2011-03-02 10:41:01,579 INFO org.apache.hadoop.mapred.MapTask: data
buffer = 79691776/99614720
2011-03-02 10:41:01,579 INFO org.apache.hadoop.mapred.MapTask: record
buffer = 262144/327680
2011-03-02 10:42:01,446 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2011-03-02 10:42:01,446 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufend = 11272192; bufvoid = 99614720


notice the io.sort.mb in avro format takes more than 2 minutes while
the raw version is almost instant. So what is going on there?

Mime
View raw message