hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: splits and maps
Date Wed, 19 Sep 2012 16:33:37 GMT
Thanks for the explanation HJ - I always meant to look into that bit of
code to work out how it did it.

Tim



On Wed, Sep 19, 2012 at 6:24 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi Tim,
>
> Splits don't look at newlines in the TextInputFormat at least. So
> since the computed splits > default map numbers, I think a perfect
> file of 10 blocks will spawn only 10 mappers. The mapper's record
> reader is the one that reads until a newline (even after the end of
> its block length bytes).
>
> On Wed, Sep 19, 2012 at 9:16 PM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
> > I think the splitting recognises the end of line, so you might get 11 but
> > otherwise that looks correct.
> >
> >
> >
> > On Wed, Sep 19, 2012 at 5:42 PM, Pedro Sá da Costa <psdc1978@gmail.com>
> > wrote:
> >>
> >>
> >>
> >> If I've an input  file of 640MB in size, and a split size of 64Mb, this
> >> file will be partitioned in 10 splits, and each split will be processed
> by a
> >> map task, right?
> >>
> >> --
> >> Best regards,
> >>
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message