Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A1D010B39 for ; Thu, 27 Feb 2014 17:04:00 +0000 (UTC) Received: (qmail 27556 invoked by uid 500); 27 Feb 2014 17:03:59 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 27491 invoked by uid 500); 27 Feb 2014 17:03:59 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 27482 invoked by uid 99); 27 Feb 2014 17:03:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Feb 2014 17:03:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tom@cloudera.com designates 209.85.219.50 as permitted sender) Received: from [209.85.219.50] (HELO mail-oa0-f50.google.com) (209.85.219.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Feb 2014 17:03:53 +0000 Received: by mail-oa0-f50.google.com with SMTP id i11so2738698oag.37 for ; Thu, 27 Feb 2014 09:03:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=aqXpQOA9U8ufpajNaEO1lMj6KOWQt9sFCTGF3B6fOqw=; b=SVyfzbJeTdy02CE6RgS6aTxopKqBAn8ZnSwrvOsFSpmbzYnVvFUXe8BFYkQCPT1Tuv RvXKZym+WXit8Ll801Ptsw1UmGm1CtGVoQBQTpOr2JZlKDQenzJYM8NDcqGTAIcU5QJB H8UTFyc8x7Byn/lcgJEXs9SoGu+Gy6tt316ohV8/OMhlXyiq4j44lYAh3V4Zj/ojFP6i JMUonCHKCrRKiQVk2f6nDiaWRvN7C2m4Eu5edK1TyFPQlEjWJ/TNKpHfeJ9m+kpZMcLn 6tZJWD1Jy0h21NUyqd71G3QSyLSgf7CTjdBM06unZrkFqlEsmhhURU3iNUD4obEjdU96 ErMQ== X-Gm-Message-State: ALoCoQnfRUuxTJP+Rh6QKrjWg8xQoRpyGVUvDQR1RZFor5MLMSGZFBlpzQpnBWWLQyDsSYLwznjy X-Received: by 10.60.45.206 with SMTP id p14mr12847309oem.21.1393520612506; Thu, 27 Feb 2014 09:03:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.76.152.72 with HTTP; Thu, 27 Feb 2014 09:03:11 -0800 (PST) In-Reply-To: References: From: Tom White Date: Thu, 27 Feb 2014 17:03:11 +0000 Message-ID: Subject: Re: Support OutputCommitter? To: dev@crunch.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Is it possible to have multiple targets that Crunch runs in one MapReduce job? If so then there will be a conflict, and Crunch will need some changes to support this case. Tom On Thu, Feb 27, 2014 at 3:34 PM, Chao Shi wrote: > Hi Tom, > > I will have to use named-output. About your example DatasetTarget, is it > safe to setOutputFormat() explicitly here? I guess this may conflict with > other targets that only use the same trick. Is it possible for us to have a > general approach to get OutputCommitter work? > Hi Chao, > > Crunch doesn't call the output committer explicitly itself, it's > called by the MR framework as a normal part of running a job. However, > in Crunch's MapReduceTarget#configureForMapReduce the output format is > not typically set for the named-output case (which is the only case > that is executed now, as I discovered in the thread mentioned below), > so it defaults to FileOutputFormat, with its semantics. (This is why > HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't > have to if it set the output format explicitly to HBase's > TableOutputFormat.) > > Are you setting the HCatOutputFormat in the named-output case? In the > Crunch Target I'm writing I've set the OutputFormat explicitly: > https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106 > > Cheers, > Tom > > On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid > wrote: >> For reference, here's the link to the previous thread on this: >> > http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dkSxmrfSPVA@mail.gmail.com%3e >> >> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills wrote: >>> +tom >>> >>> Didn't Tom have a thing like this a little while ago? >>> >>> >>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi wrote: >>> >>>> Hi crunch devs, >>>> >>>> I'm developing target wrapper for HCatOutputFormat, which uses a custom >>>> OutputCommiter to get results committed to hive. It seems its >>>> OutputCommitter is not called at all. Looking into the code, I can't > find >>>> where crunch calls it. Is it really supported? >>>> >>>> Thanks, >>>> Chao >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera >>> Twitter: @josh_wills