hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuri Nagarin <secs...@gmail.com>
Subject Improving MR job disk IO
Date Thu, 10 Oct 2013 18:05:05 GMT

I have a simple Grep job (from bundled examples) that I am running on a
11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT
on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node.

When I run the Grep job, I notice that CPU gets pegged to 100% on multiple
cores but disk throughput remains a dismal 1-2 Mbytes/sec on a single disk
on each node. So I guess, the cluster is poorly performing in terms of disk
IO. Running Terasort, I see each disk puts out 25-35 Mbytes/sec with a
total cluster throughput of above 1.5 Gbytes/sec.

How do I go about re-configuring or re-writing the job to utilize maximum
disk IO?



View raw message