crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <>
Subject Re: MultipleOutput in crunch
Date Sat, 09 Mar 2013 04:16:29 GMT
Instead of implementing a filter could you switch to using a DoFn and
emit a Pair?  Then the first part of the pair would be the identifier
for the category of data.  You can then group by key to process them
differently or just keep processing them by the same DoFn using the
key as a flag to how to process it.

That being said I'm not really sure this would be any more efficient
than filtering twice.

On Fri, Mar 8, 2013 at 8:53 PM, Peter Knap <> wrote:
> Hi,
> Is multiple output functionality supported by crunch? I have looked at the
> source code but could find a way to do it. I have the following scenario:
> input file would be processed by multiple sequential filters, the records
> passing the filter criteria need to be processed differently than the ones
> which are not. What's the best way to do it in crunch? I know I can proccess
> the input data twice by two different fillters but this is not efficient.
> Any suggestion from you guys?
> Thanks,
> Piotr

View raw message