hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: Writing compressed data to HDFS
Date Tue, 01 Jun 2010 15:25:19 GMT
This isn't really a Hadoop issue, but gunzip will refuse to decompress
files that don't have a well known suffix. Rename the file to have the
file .gz and try again or use the -S option to specify an alternate
suffix.

On Tue, Jun 1, 2010 at 10:28 AM, Arv Mistry <arv@kindsight.net> wrote:
> Hi,
>
> I have a java process that writes compressed data to the HDFS. The way I
> am doing that is wrapping the FSDataOutputSTream with GZIPOutputStream
> and calling the write() method i.e. something like
>
> FSDataOutputSTream  out = fs.create(file);
> gzip = new GZIPOutputStream(out);
> gzip.write("sss".getBytes("UTF8");
>
> The file seems to get written ok.
>
> However, when I get the file out of HDFS and try to unzip it, it
> complains;
>
> gunzip: cs_1_20100601_120000_1275396891183.cgz: unknown suffix --
> ignored
>
> When I do 'file' it is recognized as 'gzip compressed data, from FAT
> filesystem (MS-DOS, OS/2, NT)'
>
> Any ideas? Appreciate any help.
>
> Cheers Arv
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Mime
View raw message