crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Output Sizing
Date Mon, 26 Jan 2015 21:24:44 GMT
Hrm-- maybe something like the AvroPathPerKeyTarget, and a DoFn that
divides the data up into enough keys so that the data associated with a
given key is always < 10MB?

On Mon, Jan 26, 2015 at 1:15 PM, David Ortiz <dpo5003@gmail.com> wrote:

> Hello,
>
>      Is there any way to control output sizing on the crunch pipeline's
> write method?  I am processing data which is written to s3 for a program
> which cannot handle more than 10-20 MB per file, and am at a loss for how
> to do this without writing a hive script to process the data.
>
> Thanks,
>      David Ortiz
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message