Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: HDFS - millions of files in one directory?
Date Mon, 26 Jan 2009 21:13:50 GMT
Sequence files rock, and you can use the
*
bin/hadoop dfs -text FILENAME* command line tool to get a toString level
unpacking of the sequence file key,value pairs.

If you provide your own key or value classes, you will need to implement a
toString method to get some use out of this. Also, your class path will need
to include the jars with your custom key/value classes.

HADOOP_CLASSPATH="myjar1;myjar2..." *bin/hadoop dfs -text FILENAME*


On Mon, Jan 26, 2009 at 1:08 PM, Mark Kerzner <markkerzner@gmail.com> wrote:

> Thank you, Doug, then all is clear in my head.
> Mark
>
> On Mon, Jan 26, 2009 at 3:05 PM, Doug Cutting <cutting@apache.org> wrote:
>
> > Mark Kerzner wrote:
> >
> >> Okay, I am convinced. I only noticed that Doug, the originator, was not
> >> happy about it - but in open source one has to give up control
> sometimes.
> >>
> >
> > I think perhaps you misunderstood my remarks.  My point was that, if you
> > looked to Nutch's Content class for an example, it is, for historical
> > reasons, somewhat more complicated than it needs to be and is thus a less
> > than perfect example.  But using SequenceFile to store web content is
> > certainly a best practice and I did not mean to imply otherwise.
> >
> > Doug
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message