hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Using SequenceFile instead of TextFiles
Date Sat, 05 Mar 2011 05:26:52 GMT
Thanks again Harsh, I actually got the book 2 days ago, but didn't have time to read it yet.

Maha

On Mar 4, 2011, at 7:54 PM, Harsh J wrote:

> Hi,
> 
> On Sat, Mar 5, 2011 at 9:03 AM, maha <maha@umail.ucsb.edu> wrote:
>> Hi,
>> 
>> I have 2 questions:
>> 
>> 1) Is a  SequenceFile more efficient than TextFiles for input?  ... I think TextFiles
will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles
(ie.binary input Files) be more efficient ?
> 
> Depends on what your scenario is.
> 
>> 2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header
protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html
?
> 
> No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes.
> 
> May I suggest reading a good Hadoop book that covers the little,
> scattered stuff like this, neatly? I like Tom White's Hadoop: The
> Definitive Guide :)
> 
> -- 
> Harsh J
> www.harshj.com


Mime
View raw message