crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Writing compressed sequence files
Date Fri, 02 Aug 2013 23:56:12 GMT
Hey Som,

The Pipeline object that coordinates the flow has a getConfiguration()
method where you can set any options you might like and they will propagate
to all of your jars.

I usually implement Hadoop's Tool interface and then specify these
configuration options on the command line so I can play with them
independent of the logic of my runtime, and I end up w/something like:

hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D
mapred.output.compression.type=block etc.

I think that having some syntactic sugar for compressing Target objects
(like To.sequenceFile or To.avroFile) would be a nice JIRA.


On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <> wrote:

> Hi all,
> I am trying to write compressed sequence files at the end of my crunch
> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path))
> for that.
> However, Crunch is writing an uncompressed sequence file by default. How
> do I pass the codec that I want to use to Crunch?
> Looking forward for your inputs.
> Thanks,
> Som

Director of Data Science
Cloudera <>
Twitter: @josh_wills <>

View raw message