hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piyush Kansal <piyush.kan...@gmail.com>
Subject Re: v0.20.203: How to compress files in Reducer
Date Sun, 15 Apr 2012 03:15:25 GMT
Thanks, it worked :)

On Sat, Apr 14, 2012 at 5:53 PM, vasanth kumar <rj.vasanthkumar@gmail.com>wrote:

> Hi,
> Check this code works
>
>             GzipCodec gzipcodec = new GzipCodec();
>
>             conf.setBoolean("hadoop.native.lib", true);
>             gzipcodec.setConf(conf);
>
>             out = fs.create(new Path("/home/user/test"));
>             CompressionOutputStream stream
> =gzipcodec.createOutputStream(out);
>             String test = "test data";
>             stream.write(test.getBytes());
>             stream.finish();
>             IOUtils.closeStream(out);
>
> Getting NullPointerException when setConf(conf) method is not used above.
>
>
>
>
> On Sat, Apr 14, 2012 at 10:08 PM, Piyush Kansal <piyush.kansal@gmail.com>wrote:
>
>> Hi,
>>
>> Can you please suggest for the below query.
>>
>>
>> On Thu, Apr 12, 2012 at 11:01 PM, Piyush Kansal <piyush.kansal@gmail.com>wrote:
>>
>>> Thanks for your quick response Harsh.
>>>
>>> I tried using following:
>>> 1 OutputStream out = ipFs.create( new Path( opDir + "/" + fileName ) );
>>> 2 CompressionCodec codec = new GzipCodec();
>>> 3 OutputStream cs = codec.createOutputStream( out );
>>> 4 BufferedWriter cout = new BufferedWriter( new OutputStreamWriter( cs
>>> ) );
>>> 5      cout.write( ... )
>>>
>>> But got null pointer exception in line 3. Am I doing anything wrong:
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
>>> at
>>> org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
>>>  at myFile$myReduce.reduce(myFile.java:354)
>>>
>>> I also got following JIRA<http://mail-archives.apache.org/mod_mbox/hbase-issues/201202.mbox/%3C1886894051.6677.1329950151727.JavaMail.tomcat@hel.zones.apache.org%3E>for
the same. So, can you please suggest how can this be handled.
>>>
>>> On Thu, Apr 12, 2012 at 10:31 PM, Harsh J <harsh@cloudera.com> wrote:
>>>
>>>> If you're using the APIs directly, instead of the framework's offered
>>>> APIs like MultipleOutputs and the like, you need to follow this:
>>>>
>>>> OutputStream os = fs.open(…);
>>>> CompressionCodec codec = new GzipCodec(); // Or other codec. See also,
>>>> CompressionCodecFactory class for some helpers.
>>>> OutputStream cs = codec.getOutputStream(os);
>>>> // Now use cs as your output stream object for writes.
>>>>
>>>> On Fri, Apr 13, 2012 at 6:14 AM, Piyush Kansal <piyush.kansal@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am creating o/p files in reducer using my own file name convention.
>>>> So,
>>>> > using FileSystem APIs I am dumping data in the files. I now want to
>>>> compress
>>>> > these files while writing so as to write lesser amount of data and
>>>> also to
>>>> > save the space on HDFS.
>>>> >
>>>> > So, I tried following options, but none of them worked:
>>>> > - setting the "mapred.output.compress" to true
>>>> > - job.setOutputFormatClass( TextOutputFormat.class);
>>>> >   TextOutputFormat.setCompressOutput(job, true);
>>>> >   TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>>>> > - I also tried looking into the exiting FileSystem and FileUtil APIs
>>>> but
>>>> > none of them has an API to write the file in compressed format
>>>> >
>>>> > Can you please suggest how can I achieve the required goal.
>>>> >
>>>> > --
>>>> > Regards,
>>>> > Piyush Kansal
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Piyush Kansal
>>>
>>>
>>
>>
>> --
>> Regards,
>> Piyush Kansal
>>
>>
>
>
> --
> Regards
> Vasanth kumar RJ
>



-- 
Regards,
Piyush Kansal

Mime
View raw message