Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60BACEF31 for ; Sat, 9 Mar 2013 04:17:05 +0000 (UTC) Received: (qmail 67030 invoked by uid 500); 9 Mar 2013 04:17:05 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 66793 invoked by uid 500); 9 Mar 2013 04:16:58 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 66759 invoked by uid 99); 9 Mar 2013 04:16:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Mar 2013 04:16:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mkwhitacre@gmail.com designates 209.85.219.45 as permitted sender) Received: from [209.85.219.45] (HELO mail-oa0-f45.google.com) (209.85.219.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Mar 2013 04:16:50 +0000 Received: by mail-oa0-f45.google.com with SMTP id o6so2880414oag.18 for ; Fri, 08 Mar 2013 20:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=VOn4DT5gaUT/AKCpW9PXW+3sb4oKss7za0eGLkeCPDE=; b=jD6qewzLOR+mFRYkCK/LsyWCcN+54zh9B1SoomJ5fQhYLPHcnkSvtMAsvk6p1gQGtO wvK+puy5oOaOqzIsjjVx3+1rLMd6g4OXvj+AtGZTXgeCo23eEh4x2k0UK7OzRDq1N0TD 7vbF0Da+SDsD/kk8DIBOLL/dnXwwU0JohXt8LFfzU6YDWqKBY32KKVhRdRT/h129WyLR vgcFvpkqUkIi2XihMYz/nRQMFdnyHMuBnH/qI7/+ZJmjNTFvbpM0EzhdzUMyrV8veN87 l0eyIi3eucGtaRqphU/Kg72p5mGMj9F6c4V4H8yuJXyqqeB0SXK8MYXFL/Wpz3t62iRJ w8fA== MIME-Version: 1.0 X-Received: by 10.182.123.49 with SMTP id lx17mr3761477obb.63.1362802589612; Fri, 08 Mar 2013 20:16:29 -0800 (PST) Received: by 10.60.172.79 with HTTP; Fri, 8 Mar 2013 20:16:29 -0800 (PST) In-Reply-To: <1362797600.68212.YahooMailNeo@web185001.mail.gq1.yahoo.com> References: <1362797600.68212.YahooMailNeo@web185001.mail.gq1.yahoo.com> Date: Fri, 8 Mar 2013 22:16:29 -0600 Message-ID: Subject: Re: MultipleOutput in crunch From: Micah Whitacre To: user@crunch.apache.org, Peter Knap Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Instead of implementing a filter could you switch to using a DoFn and emit a Pair? Then the first part of the pair would be the identifier for the category of data. You can then group by key to process them differently or just keep processing them by the same DoFn using the key as a flag to how to process it. That being said I'm not really sure this would be any more efficient than filtering twice. On Fri, Mar 8, 2013 at 8:53 PM, Peter Knap wrote: > Hi, > > Is multiple output functionality supported by crunch? I have looked at the > source code but could find a way to do it. I have the following scenario: > input file would be processed by multiple sequential filters, the records > passing the filter criteria need to be processed differently than the ones > which are not. What's the best way to do it in crunch? I know I can proccess > the input data twice by two different fillters but this is not efficient. > Any suggestion from you guys? > > Thanks, > Piotr >