Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2386410948 for ; Wed, 29 Jan 2014 21:22:26 +0000 (UTC) Received: (qmail 7456 invoked by uid 500); 29 Jan 2014 21:22:25 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 7404 invoked by uid 500); 29 Jan 2014 21:22:24 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 7396 invoked by uid 99); 29 Jan 2014 21:22:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 21:22:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mkwhit@gmail.com designates 209.85.223.174 as permitted sender) Received: from [209.85.223.174] (HELO mail-ie0-f174.google.com) (209.85.223.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 21:22:20 +0000 Received: by mail-ie0-f174.google.com with SMTP id tp5so2653112ieb.33 for ; Wed, 29 Jan 2014 13:22:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=G1ROD7CA0P4NHXHgfEzQ6po4dEismW1OhPb+fZSd1C8=; b=CM53jJBAmPI4bxFUgciEp6PUpaOqCCupxRb6zR9274VNZb8lJ9TbVXK/M085rdfPpY P8anekn/lRRA2G97bhZwtofajsxbCFJEbXg7NcXSAtFR3zTU2IS4j0rKQR3PY75RwhLY E9nFZ4PwRqd6DWURDNGTFLN//xKpOEOz9mU+KiOiJ1hqs+S+a24sLP5UDegRxD3hKPx3 2dpXwsLG5kSPEPCWNH4Wv3Vw+La6wT9RMxZjzIxr/is0pTJwiEQP+f16Jc3qJry8wy3l zVUscMBRBlzCpjjz6EHq7rKfDy4c4XVwgoQopGlNubyHktJIPptBQJzsc13BT9rF//d8 ihow== X-Received: by 10.50.222.99 with SMTP id ql3mr10745481igc.42.1391030520194; Wed, 29 Jan 2014 13:22:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.12.167 with HTTP; Wed, 29 Jan 2014 13:21:40 -0800 (PST) In-Reply-To: References: From: Micah Whitacre Date: Wed, 29 Jan 2014 15:21:40 -0600 Message-ID: Subject: Re: Output Committers and Crunch Targets To: dev@crunch.apache.org Content-Type: multipart/alternative; boundary=001a11346b12072cf404f122887b X-Virus-Checked: Checked by ClamAV on apache.org --001a11346b12072cf404f122887b Content-Type: text/plain; charset=ISO-8859-1 >> I would expect that >> named outputs would not be used in my simple pipeline, so name would >> be null, but it actually seems that the name parameter is 'out0'. So >> my first question is: what determines when named outputs are used? Looking at the code the output is always named[1] regardless of the number of outputs. Do you believe the use of a name is causing an issue with the utilization of your custom committer? Regarding your second question I need to do a bit more digging to answer for certain. [1] - https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCROutputHandler.java#L64 On Wed, Jan 29, 2014 at 10:11 AM, Tom White wrote: > Hi, > > I'm writing a Crunch Target that is a MapReduceTarget, but not a > PathTarget, since it writes to files in a partitioned manner, so there > is not necessarily a single output path. I'm confused about the 'name' > parameter in configureForMapReduce() though - I would expect that > named outputs would not be used in my simple pipeline, so name would > be null, but it actually seems that the name parameter is 'out0'. So > my first question is: what determines when named outputs are used? > > In the past this hasn't been a problem (e.g. with the Parquet target), > but this output format has a custom output committer which isn't being > used. Instead it looks like the default file committer is being used > by Crunch, so the job fails. Is it possible to use custom output > committers with Crunch? > > My code is here: > > https://github.com/tomwhite/kite/blob/CDK-251-mr/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L100 > > Cheers, > Tom > --001a11346b12072cf404f122887b--