hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject Problems Mapping multigigabyte file
Date Fri, 14 Oct 2011 15:23:47 GMT
I have an MR task which runs well with a single input file or an input
directory with dozens of 50MB input files.

When the data is in a single input file of 1 GB of more the mapper never
gets to 0%. There are not errors but when I look at the cluster, the CPUs
are spending huge amounts of time in a wait state. The job runs when the
input is 800MB and can complete even with a number of 500MB files as input.

The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.

Any bright ideas

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

View raw message