Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 636A210611 for ; Thu, 30 Jan 2014 13:36:13 +0000 (UTC) Received: (qmail 21835 invoked by uid 500); 30 Jan 2014 13:36:12 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 21554 invoked by uid 500); 30 Jan 2014 13:36:07 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 21542 invoked by uid 99); 30 Jan 2014 13:36:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 13:36:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jwills@cloudera.com designates 209.85.216.171 as permitted sender) Received: from [209.85.216.171] (HELO mail-qc0-f171.google.com) (209.85.216.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 13:35:58 +0000 Received: by mail-qc0-f171.google.com with SMTP id n7so4855624qcx.16 for ; Thu, 30 Jan 2014 05:35:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=ghXCtEaHzzOYfEBlufMa8jxfI1aliAC0jP/VIboaeAw=; b=eq/qRUz/kmB1b3vm0gLR4mx0gVfDVkUNijEs3xofRKmGkW/aKnTQ3pyLVxiTMDpZvb /CGKppGtssHypacpwDv/HCqkjohYW4+IkBPzCr6SGoW05D1Zxqy3h5RXIJn60mQ4/bkE eclKZovUNNWBKVpR13sLNc0Xf6ta5cGw7KaYwm6ysiVkm1kM6pCNNLlVJ5TDQusImY3O LGSd68riFgtaL7R/OdcrJIFhKrWDRHve+3l1alN/rbACSGe/E1naaZ7dYXit/tSUnonp QXapFe3/DQifC8ZLyEHtwEbuxwbLeX0Ome9+4jpaaTGZzzrXeeWsWvwkQflxUAgnk0P9 29Gg== X-Gm-Message-State: ALoCoQmh1zCAwWfSHf8QSPE/lLfW3O0YGALvZLJLRWLkf75mBwKbI0rh8BZnCbYdEA8UWaxvcLYB X-Received: by 10.140.85.35 with SMTP id m32mr20564421qgd.40.1391088936774; Thu, 30 Jan 2014 05:35:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.224.173.4 with HTTP; Thu, 30 Jan 2014 05:35:16 -0800 (PST) In-Reply-To: References: From: Josh Wills Date: Thu, 30 Jan 2014 05:35:16 -0800 Message-ID: Subject: Re: Output Committers and Crunch Targets To: dev Content-Type: multipart/alternative; boundary=001a11c13ba0ed991104f13021ec X-Virus-Checked: Checked by ClamAV on apache.org --001a11c13ba0ed991104f13021ec Content-Type: text/plain; charset=ISO-8859-1 The first point is correct-- we always use the multiple outputs configuration options now, even if there is only a single output. The second point surprises me-- HBaseTarget (for example) uses a custom output committer w/its OutputFormat without issue, although of course we're still stuck setting the Path field in the Conf via FileOutputFormat. Maybe look at HBaseTarget as a reference here? On Wed, Jan 29, 2014 at 1:21 PM, Micah Whitacre wrote: > >> I would expect that > >> named outputs would not be used in my simple pipeline, so name would > >> be null, but it actually seems that the name parameter is 'out0'. So > >> my first question is: what determines when named outputs are used? > > Looking at the code the output is always named[1] regardless of the number > of outputs. Do you believe the use of a name is causing an issue with the > utilization of your custom committer? > > Regarding your second question I need to do a bit more digging to answer > for certain. > > [1] - > > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCROutputHandler.java#L64 > > > > > On Wed, Jan 29, 2014 at 10:11 AM, Tom White wrote: > > > Hi, > > > > I'm writing a Crunch Target that is a MapReduceTarget, but not a > > PathTarget, since it writes to files in a partitioned manner, so there > > is not necessarily a single output path. I'm confused about the 'name' > > parameter in configureForMapReduce() though - I would expect that > > named outputs would not be used in my simple pipeline, so name would > > be null, but it actually seems that the name parameter is 'out0'. So > > my first question is: what determines when named outputs are used? > > > > In the past this hasn't been a problem (e.g. with the Parquet target), > > but this output format has a custom output committer which isn't being > > used. Instead it looks like the default file committer is being used > > by Crunch, so the job fails. Is it possible to use custom output > > committers with Crunch? > > > > My code is here: > > > > > https://github.com/tomwhite/kite/blob/CDK-251-mr/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L100 > > > > Cheers, > > Tom > > > -- Director of Data Science Cloudera Twitter: @josh_wills --001a11c13ba0ed991104f13021ec--