hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: Help tuning a cluster - COPY slow
Date Wed, 17 Nov 2010 09:20:23 GMT
Hi Tim,

Getting 28K of map outputs to reducers should not take minutes. Reducers on a properly setup
(1Gb) network should be copying at multiple MB/s. I think you need to get some more info.

Apart from top, you'll probably also want to look at iostat and vmstat. The first will tell
you something about disk utilization and the latter can tell you whether the machines are
using swap or not. This is very important. If you are over utilizing physical memory on the
machines, thing will be slow.

It's even better if you put something in place that allows you to get an overall view of the
resource usage across the cluster. Look at Ganglia (http://ganglia.sourceforge.net/) or Cacti
(http://www.cacti.net/) or something similar.

Basically a job is either CPU bound, IO bound or network bound. You need to be able to look
at all three to see what the bottleneck is. Also, you can run into churn when you saturate
resources and processes are competing for them (e.g. when you have two disks and 50 processes
/ threads reading from them, things will be slow because the OS needs to switch between them
a lot and overall throughput will be less than what the disks can do; you can see this when
there is a lot of time in iowait, but overall throughput is low so there's a lot of seeks
going on).




On 17 nov 2010, at 09:43, Tim Robertson wrote:

Hi all,

We have setup a small cluster (13 nodes) using CDH3

We have been tuning it using TeraSort and Hive queries on our data,
and the copy phase is very slow, so I'd like to ask if anyone can look
over our config.

We have an unbalanced set of machines (all on a single switch):
- 10 of Intel @ 2.83GHz Quad, 8GB, 2x500G 7.2K SATA (3 mappers, 2 reducers)
- 3 of  Intel @ 2.53GHz Dual Quad, 24GB, 6x250GB 5.4K SATA (12
mappers, 12 reducers)

We monitored the load using $top on machines, to settle on the number
of mappers and reducers to stop overloading them, and the map() and
reduce() is working very nicely - all our time

The config:

io.sort.mb=400
io.sort.factor=100
mapred.reduce.parallel.copies=20
tasktracker.http.threads=80
mapred.compress.map.output=true/false (no notible difference)
mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec
mapred.output.compression.type=BLOCK
mapred.inmem.merge.threshold=0
mapred.job.reduce.input.buffer.percent=0.7
mapred.job.reuse.jvm.num.tasks=50

An example job:
(select basis_of_record,count(1) from occurrence_record group by
basis_of_record)
Map input records 262,573,931 finished in 2mins30 using 833 mappers
Reduce was at 24% at 2mins30 finished map with all 55 running
Map output records: 1,855
Map output bytes: 28,724
REDUCE COPY PHASE finished after 7mins01 secs
Reduce finished after 7mins17secs

I am correct that 28,724 bytes emitted from a map should not take 4mins30 right?

We're running puppet so can test changes quickly.

Any pointers on how we can debug / improve this are greatly appreciated!
Tim


Mime
View raw message