hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xavier Stevens" <Xavier.Stev...@fox.com>
Subject RE: What's the best way to get to a single key?
Date Thu, 06 Mar 2008 17:08:32 GMT
Thanks for everything so far.  It has been really helpful.  I have one
more question.  Is there a way to merge MapFile index/data files?
Assuming there is, what is the best way to do so?  I was reading the
Java docs on it and it looked like this is possible but it wasn't very
explicit.  Obviously I could specify to use a single reducer, but with
my data size that would be really slow.



-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Tuesday, March 04, 2008 12:53 PM
To: core-user@hadoop.apache.org
Subject: Re: What's the best way to get to a single key?

Xavier Stevens wrote:
> Is there a way to do this when your input data is using SequenceFile 
> compression?

Yes.  A MapFile is simply a directory containing two SequenceFiles named
"data" and "index".  MapFileOutputFormat uses the same compression
parameters as SequenceFileOutputFormat.  SequenceFileInputFormat
recognizes MapFiles and reads the "data" file.  So you should be able to
just switch from specifying SequenceFileOutputFormat to
MapFileOutputFormat in your jobs and everything should work the same
except you'll have index files that permit random access.


View raw message