hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Re: A problem about splitting a large file into serveral FileSplits
Date Tue, 15 Jul 2008 11:32:09 GMT
Hi,

As far as I know -

If a split has ended in the midst of a record, the node processing it will
get the rest of the record from the remote node hosting it. The node
processing the next split, will ignore the beginning of its split and start
after the first record separator (newline in your example).

Naama

2008/7/15 caoyuzhong <caoyuzhong@hotmail.com>:

>
> Hi,
>
> A large file will be splitted into serveral FileSplits in
> FileInputFormat.java#getSplits().
> We know FileInputFormat presents a byte-oriented view of the input file so
> a whole record (for instance a line) might be broken during the process of
> generating
> several FileSplits for a single file. Then one part of a whole record will
> be in one InputSplit and another
> part will be in another InputSplit and the two InputSplits might be
> processed in different Node.
>
> I want to know how does hadoop handle with this problem?
>
> Yu zhong
> 2008/07/15
>
>
>
> _________________________________________________________________
> 多个邮箱同步管理,live mail客户端万人抢用中
> http://get.live.cn/product/mail.html




-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message