hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Using own InputSplit
Date Fri, 27 May 2011 17:04:03 GMT

Please do not cross-post a question to multiple lists unless you're
announcing something.

What you describe, does not happen; and the way the splitting is done
for Text files is explained in good detail here:

Hope this solves your doubt :)

On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> I am new to hadoop and from what I understand by default hadoop splits
> the input into blocks. Now this might result in splitting a line of
> record into 2 pieces and getting spread accross 2 maps. For eg: Line
> "abcd" might get split into "ab" and "cd". How can one prevent this in
> hadoop and pig? I am looking for some examples where I can see how I
> can specify my own split so that it logically splits based on the
> record delimiter and not the block size. For some reason I am not able
> to get right examples online.

Harsh J

View raw message