hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Woody <justin.wo...@gmail.com>
Subject Re: Problems Mapping multigigabyte file
Date Fri, 14 Oct 2011 15:49:25 GMT
Steve,

Is the input file splittable?

Justin

On Fri, Oct 14, 2011 at 11:23 AM, Steve Lewis <lordjoe2000@gmail.com> wrote:
> I have an MR task which runs well with a single input file or an input
> directory with dozens of 50MB input files.
> When the data is in a single input file of 1 GB of more the mapper never
> gets to 0%. There are not errors but when I look at the cluster, the CPUs
> are spending huge amounts of time in a wait state. The job runs when the
> input is 800MB and can complete even with a number of 500MB files as input.
> The cluster (0.02) has 8 nodes - 8 cpu per node. Block size is 64MB.
> Any bright ideas
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>
>
>

Mime
View raw message