hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piyush Kansal <piyush.kan...@gmail.com>
Subject Re: v0.20.203: How to compress files in Reducer
Date Fri, 13 Apr 2012 03:01:26 GMT
Thanks for your quick response Harsh.

I tried using following:
1 OutputStream out = ipFs.create( new Path( opDir + "/" + fileName ) );
2 CompressionCodec codec = new GzipCodec();
3 OutputStream cs = codec.createOutputStream( out );
4 BufferedWriter cout = new BufferedWriter( new OutputStreamWriter( cs ) );
5      cout.write( ... )

But got null pointer exception in line 3. Am I doing anything wrong:
java.lang.NullPointerException
at
org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
at
org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
at myFile$myReduce.reduce(myFile.java:354)

I also got following
JIRA<http://mail-archives.apache.org/mod_mbox/hbase-issues/201202.mbox/%3C1886894051.6677.1329950151727.JavaMail.tomcat@hel.zones.apache.org%3E>for
the same. So, can you please suggest how can this be handled.

On Thu, Apr 12, 2012 at 10:31 PM, Harsh J <harsh@cloudera.com> wrote:

> If you're using the APIs directly, instead of the framework's offered
> APIs like MultipleOutputs and the like, you need to follow this:
>
> OutputStream os = fs.open(…);
> CompressionCodec codec = new GzipCodec(); // Or other codec. See also,
> CompressionCodecFactory class for some helpers.
> OutputStream cs = codec.getOutputStream(os);
> // Now use cs as your output stream object for writes.
>
> On Fri, Apr 13, 2012 at 6:14 AM, Piyush Kansal <piyush.kansal@gmail.com>
> wrote:
> > Hi,
> >
> > I am creating o/p files in reducer using my own file name convention. So,
> > using FileSystem APIs I am dumping data in the files. I now want to
> compress
> > these files while writing so as to write lesser amount of data and also
> to
> > save the space on HDFS.
> >
> > So, I tried following options, but none of them worked:
> > - setting the "mapred.output.compress" to true
> > - job.setOutputFormatClass( TextOutputFormat.class);
> >   TextOutputFormat.setCompressOutput(job, true);
> >   TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
> > - I also tried looking into the exiting FileSystem and FileUtil APIs but
> > none of them has an API to write the file in compressed format
> >
> > Can you please suggest how can I achieve the required goal.
> >
> > --
> > Regards,
> > Piyush Kansal
> >
>
>
>
> --
> Harsh J
>



-- 
Regards,
Piyush Kansal

Mime
View raw message