crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Making crunch job output single file
Date Wed, 30 Oct 2013 15:14:44 GMT
Hey Som,

Check out org.apache.crunch.lib.Shard, it does what you want.

J


On Wed, Oct 30, 2013 at 8:05 AM, Som Satpathy <somsatpathy@gmail.com> wrote:

> Hi all,
>
> I have a crunch job that should process a big sequence file and produce a
> single csv file. I am using the "pipeline.writeTextFile(transformedRecords,
> csvFilePath)" to write to a csv. (csvFilePath is like
> "/data/csv_directory"). The larger the input sequence file is, more number
> of mappers are being created and thus equivalent number of csv output files
> are being created.
>
> In classic mapreduce one could output a single file by setting the
> #reducers to 1 while configuring the job. How could I achieve this with
> crunch?
>
> I would really appreciate any help here.
>
> Thanks,
> Som
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message