crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-347) Allow writing of single file outputs
Date Tue, 18 Feb 2014 01:54:19 GMT


Josh Wills commented on CRUNCH-347:

[~jgmath2000], is there a trick to doing this? I'm not familiar with ways to safely write
the outputs from multiple reducers to a single file on HDFS.

> Allow writing of single file outputs
> ------------------------------------
>                 Key: CRUNCH-347
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>    Affects Versions: 0.9.0
>            Reporter: Jason Gauci
>            Priority: Minor
> One of the outputs from our system needs to be a single file to support a system that
is ingesting the data downstream.  We currently run the job and then cat the output files
together to create the final output, but it would be nice if we could pass a flag to the write(...)
function to handle this case.
> Note that setting the number of reducers globally for the entire job doesn't work in
this case because of the significant performance implications.

This message was sent by Atlassian JIRA

View raw message