hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Arietta <sarie...@virginia.edu>
Subject Re: 1 file per record
Date Tue, 17 Mar 2009 03:21:48 GMT

I have a similar issue and would like some clarification if possible. Suppose
each file is meant to be emitted as a one single record to a set of map
tasks. That is, each key-value pair will include data from one file and one
file alone. 

I have written custom InputFormats and RecordReaders before so I am familiar
with the general process. Does it suffice to just return an empty array from
the InputFormat.getSplits() function and then take care of the actual record
emitting from inside the custom RecordReader? 

Thanks for your time!


owen.omalley wrote:
> On Oct 2, 2008, at 1:50 AM, chandravadana wrote:
>> If we dont specify numSplits in getsplits(), then what is the default
>> number of splits taken...
> The getSplits() is either library or user code, so it depends which  
> class you are using as your InputFormat. The FileInputFormats  
> (TextInputFormat and SequenceFileInputFormat) basically divide input  
> files by blocks, unless the requested number of mappers is really high.
> -- Owen

View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p22551968.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message