hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Re: Upload, then decompress archive on HDFS?
Date Fri, 05 Aug 2011 15:14:25 GMT
I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files
and resaving them back to HDFS.  I just didn't think it should be necessary to *write a program*
to do something so seemingly minimal.  This (tarring/compressing/etc.) seems like an obvious
method for moving data back and forth; I would expect the tools to support it.

I'll read up on "-text".  Maybe that really is what I wanted, although I'm dubious since this
has nothing to do with textual data at all.  Anyway, I'll see what I can find on that.


On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

> Keith,
> The 'hadoop fs -text' tool does decompress a file given to it if
> needed/able, but what you could also do is run a distributed mapreduce
> job that converts from compressed to decompressed, that'd be much
> faster.
> On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <kwiley@keithwiley.com> wrote:
>> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped
(or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive
on HDFS?  I can't figure out what "hd fs" type command would do such a thing.
>> Thanks.

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley

View raw message