hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Does FileSplit respect the record boundary?
Date Fri, 10 Feb 2012 17:02:39 GMT

Please read the map section of
http://wiki.apache.org/hadoop/HadoopMapReduce to understand how Hadoop
ends up respecting record boundaries despite block-chops not taking
that into consideration. I hope it helps clear things up for you.

On Fri, Feb 10, 2012 at 10:26 PM, GUOJUN Zhu <guojun_zhu@freddiemac.com> wrote:
> Hi,
> I am learning Hadoop.  We have some special formated text file for input, so
> we need to write some customized inputFormat, probably based on
> FileInputFormat.  Does the FileInputFormat respect the record boundary
> (every line or maybe every other line)?  I am reading the source code
> (1.0.0).  For example in the LineRecordReader, is "in" field (InputStream)
> of the LineReader(in,..) the full HDFS file (of many blocks) or just the
> real local file of one block?  All books I read have very little details
> about it.   Can any expert point me to some reference about it, or maybe
> which part of the source code I should concentrate on?  Thanks.
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_zhu@freddiemac.com
> Financial Engineering
> Freddie Mac

Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

View raw message