hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Howland" <john.d.howl...@gmail.com>
Subject Re: Getting started questions
Date Mon, 08 Sep 2008 16:35:17 GMT

Thanks for the detailed response. I need to play with the SequenceFile
format a bit -- I found the documentation for it on the wiki. I think
I could build on top of the format to handle storage of very large
documents. The vast majority of documents will fit into RAM and in a
standard HDFS block (64MB, maybe up it to 128MB). For very large
documents, I can split them into consecutive records in the
SequenceFile. I can overload the key to be a combination of a "real"
key and a record number... Shouldn't be too hard to extend
SequenceFile to do this.

Much obliged,


View raw message