hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip (flip) Kromer" <f...@infochimps.org>
Subject Re: best way to copy all files from a file system to hdfs
Date Mon, 02 Feb 2009 05:44:14 GMT
Could you tar.bz2 them up (setting up the tar so that it made a few dozen
files), toss them onto the HDFS, and use
http://stuartsierra.com/2008/04/24/a-million-little-files
to go into SequenceFile?

This lets you preserve the originals and do the sequence file conversion
across the cluster. It's only really helpful, of course, if you also want to
prepare a .tar.bz2 so you can clear out the sprawl

flip

On Sun, Feb 1, 2009 at 11:22 PM, Mark Kerzner <markkerzner@gmail.com> wrote:

> Hi,
>
> I am writing an application to copy all files from a regular PC to a
> SequenceFile. I can surely do this by simply recursing all directories on
> my
> PC, but I wonder if there is any way to parallelize this, a MapReduce task
> even. Tom White's books seems to imply that it will have to be a custom
> application.
>
> Thank you,
> Mark
>



-- 
http://www.infochimps.org
Connected Open Free Data

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message