hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From caoyuzhong <caoyuzh...@hotmail.com>
Subject A problem about splitting a large file into serveral FileSplits
Date Tue, 15 Jul 2008 10:17:07 GMT


A large file will be splitted into serveral FileSplits in FileInputFormat.java#getSplits().
We know FileInputFormat presents a byte-oriented view of the input file so
a whole record (for instance a line) might be broken during the process of generating 
several FileSplits for a single file. Then one part of a whole record will be in one InputSplit
and another
part will be in another InputSplit and the two InputSplits might be processed in different

I want to know how does hadoop handle with this problem?

Yu zhong

多个邮箱同步管理,live mail客户端万人抢用中
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message