crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Som Satpathy <>
Subject Re: Writing compressed sequence files
Date Sat, 03 Aug 2013 00:33:48 GMT
Thanks Josh. I tried setting compression parameters via the Configuration
object and also via command line, but the output sequence file never seems
to get compressed. I'm trying to Snappy compress it.

If I trying creating a sequence file outside of crunch using
SequenceFile.createWriter, I see the file getting compressed with my
compression type (i.e Snappy)

I was wondering if this is a know issue with crunch..


On Fri, Aug 2, 2013 at 4:56 PM, Josh Wills <> wrote:

> Hey Som,
> The Pipeline object that coordinates the flow has a getConfiguration()
> method where you can set any options you might like and they will propagate
> to all of your jars.
> I usually implement Hadoop's Tool interface and then specify these
> configuration options on the command line so I can play with them
> independent of the logic of my runtime, and I end up w/something like:
> hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D
> mapred.output.compression.type=block etc.
> I think that having some syntactic sugar for compressing Target objects
> (like To.sequenceFile or To.avroFile) would be a nice JIRA.
> J
> On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <>wrote:
>> Hi all,
>> I am trying to write compressed sequence files at the end of my crunch
>> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path))
>> for that.
>> However, Crunch is writing an uncompressed sequence file by default. How
>> do I pass the codec that I want to use to Crunch?
>> Looking forward for your inputs.
>> Thanks,
>> Som
> --
> Director of Data Science
> Cloudera <>
> Twitter: @josh_wills <>

View raw message