crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Writing compressed sequence files
Date Fri, 02 Aug 2013 23:56:12 GMT
Hey Som,

The Pipeline object that coordinates the flow has a getConfiguration()
method where you can set any options you might like and they will propagate
to all of your jars.

I usually implement Hadoop's Tool interface and then specify these
configuration options on the command line so I can play with them
independent of the logic of my runtime, and I end up w/something like:

hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D
mapred.output.compression.type=block etc.

I think that having some syntactic sugar for compressing Target objects
(like To.sequenceFile or To.avroFile) would be a nice JIRA.

J


On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <somsatpathy@gmail.com> wrote:

> Hi all,
>
> I am trying to write compressed sequence files at the end of my crunch
> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path))
> for that.
> However, Crunch is writing an uncompressed sequence file by default. How
> do I pass the codec that I want to use to Crunch?
>
> Looking forward for your inputs.
>
> Thanks,
> Som
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message