crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-347) Allow writing of single file outputs
Date Tue, 18 Feb 2014 21:15:20 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904607#comment-13904607
] 

Gabriel Reid commented on CRUNCH-347:
-------------------------------------

I think that Shard is indeed the best way to take care of something like this.

[~jgmath2000] about the granularity of crunch.max.reducers, PTable#groupByKey (which triggers
a reduce) can take a number of partitions as a parameter, which allows you to specify how
many reducers will be used on that specific reduce. Does that resolve your issue on the reducer
count granularity?

> Allow writing of single file outputs
> ------------------------------------
>
>                 Key: CRUNCH-347
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-347
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>    Affects Versions: 0.9.0
>            Reporter: Jason Gauci
>            Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a system that
is ingesting the data downstream.  We currently run the job and then cat the output files
together to create the final output, but it would be nice if we could pass a flag to the write(...)
function to handle this case.
> Note that setting the number of reducers globally for the entire job doesn't work in
this case because of the significant performance implications.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message