crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Som Satpathy <somsatpa...@gmail.com>
Subject Re: Writing compressed sequence files
Date Sat, 03 Aug 2013 00:33:48 GMT
Thanks Josh. I tried setting compression parameters via the Configuration
object and also via command line, but the output sequence file never seems
to get compressed. I'm trying to Snappy compress it.

If I trying creating a sequence file outside of crunch using
SequenceFile.createWriter, I see the file getting compressed with my
compression type (i.e Snappy)

I was wondering if this is a know issue with crunch..

Thanks,
Som

On Fri, Aug 2, 2013 at 4:56 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Som,
>
> The Pipeline object that coordinates the flow has a getConfiguration()
> method where you can set any options you might like and they will propagate
> to all of your jars.
>
> I usually implement Hadoop's Tool interface and then specify these
> configuration options on the command line so I can play with them
> independent of the logic of my runtime, and I end up w/something like:
>
> hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D
> mapred.output.compression.type=block etc.
>
> I think that having some syntactic sugar for compressing Target objects
> (like To.sequenceFile or To.avroFile) would be a nice JIRA.
>
> J
>
>
> On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <somsatpathy@gmail.com>wrote:
>
>> Hi all,
>>
>> I am trying to write compressed sequence files at the end of my crunch
>> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path))
>> for that.
>> However, Crunch is writing an uncompressed sequence file by default. How
>> do I pass the codec that I want to use to Crunch?
>>
>> Looking forward for your inputs.
>>
>> Thanks,
>> Som
>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message