Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCC02102C9 for ; Fri, 28 Feb 2014 02:27:46 +0000 (UTC) Received: (qmail 85527 invoked by uid 500); 28 Feb 2014 02:27:46 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 85456 invoked by uid 500); 28 Feb 2014 02:27:45 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 85448 invoked by uid 99); 28 Feb 2014 02:27:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Feb 2014 02:27:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stepinto@live.com designates 65.55.111.103 as permitted sender) Received: from [65.55.111.103] (HELO blu0-omc2-s28.blu0.hotmail.com) (65.55.111.103) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Feb 2014 02:27:39 +0000 Received: from BLU0-SMTP17 ([65.55.111.72]) by blu0-omc2-s28.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 27 Feb 2014 18:27:18 -0800 X-TMN: [dC2dUIWIYBJQsd8qBazH2SRG0Dd4wR6G] X-Originating-Email: [stepinto@live.com] Message-ID: Received: from mail-wg0-f49.google.com ([74.125.82.49]) by BLU0-SMTP17.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 27 Feb 2014 18:27:17 -0800 Received: by mail-wg0-f49.google.com with SMTP id x12so59354wgg.32 for ; Thu, 27 Feb 2014 18:27:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=/ETQVJ2uqtJj2QrV3XEtR4fDXgmBNE3xRVkCVq9C8sI=; b=jrkrZgPxyQ8EicSf7qQ2XaiFqVlZNu8VPPjyKt13OZrBFtX8Y/C3269KcbQFUS9nCU AQ55JB9EZrlg9LNX9bpzwDvuCygg9GOymV2LEIgZxtrxdXLyAiny11hu+fgvK7/aPzYR OgMPzO2tz0/1YXDwItaxx1VydbcZ4nzmUgE06GgCscDHn5oP9YT14xkFSnFkGKGYvnRN Ey/fiHFU39+RVwvHXwLrGSMr5WXlE5n2NwSzsYlzL/58F9e9Vb5KAqJDoj5tHPs6QvV4 irXj4e5HARfv84saGaMevK1aqgMwrH7oM21o1clH7tNNrIDRlNL92UsLhMeO2xKMJVVD uHyg== X-Received: by 10.194.93.193 with SMTP id cw1mr260476wjb.72.1393554436596; Thu, 27 Feb 2014 18:27:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.138.233 with HTTP; Thu, 27 Feb 2014 18:26:56 -0800 (PST) In-Reply-To: References: From: Chao Shi Date: Fri, 28 Feb 2014 10:26:56 +0800 Subject: Re: Support OutputCommitter? To: dev@crunch.apache.org Content-Type: multipart/alternative; boundary="047d7bdc05382b2fbf04f36e2dfa" X-OriginalArrivalTime: 28 Feb 2014 02:27:17.0006 (UTC) FILETIME=[992C5AE0:01CF342C] X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc05382b2fbf04f36e2dfa Content-Type: text/plain; charset="ISO-8859-1" How about introducinug our own OutputFormat? It can delegate to each registered OutputCommitter (if any). 2014-02-28 1:28 GMT+08:00 Josh Wills : > It's possible to have multiple targets running in one Crunch job; in fact > it was so common that I switched everything over to the named targets in > order to simplify the bookkeeping. Every output format can run > independently of every other output format using the code in CrunchOutputs; > I think the only reason we default to FileOutputFormat is b/c it's an > exception for an MR config to _not_ have an OuputFormat configured, even if > it's never used. > > > On Thu, Feb 27, 2014 at 9:03 AM, Tom White wrote: > > > Is it possible to have multiple targets that Crunch runs in one > > MapReduce job? If so then there will be a conflict, and Crunch will > > need some changes to support this case. > > > > Tom > > > > On Thu, Feb 27, 2014 at 3:34 PM, Chao Shi wrote: > > > Hi Tom, > > > > > > I will have to use named-output. About your example DatasetTarget, is > it > > > safe to setOutputFormat() explicitly here? I guess this may conflict > with > > > other targets that only use the same trick. Is it possible for us to > > have a > > > general approach to get OutputCommitter work? > > > Hi Chao, > > > > > > Crunch doesn't call the output committer explicitly itself, it's > > > called by the MR framework as a normal part of running a job. However, > > > in Crunch's MapReduceTarget#configureForMapReduce the output format is > > > not typically set for the named-output case (which is the only case > > > that is executed now, as I discovered in the thread mentioned below), > > > so it defaults to FileOutputFormat, with its semantics. (This is why > > > HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't > > > have to if it set the output format explicitly to HBase's > > > TableOutputFormat.) > > > > > > Are you setting the HCatOutputFormat in the named-output case? In the > > > Crunch Target I'm writing I've set the OutputFormat explicitly: > > > > > > https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106 > > > > > > Cheers, > > > Tom > > > > > > On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid > > > wrote: > > >> For reference, here's the link to the previous thread on this: > > >> > > > > > > http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dkSxmrfSPVA@mail.gmail.com%3e > > >> > > >> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills > > wrote: > > >>> +tom > > >>> > > >>> Didn't Tom have a thing like this a little while ago? > > >>> > > >>> > > >>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi wrote: > > >>> > > >>>> Hi crunch devs, > > >>>> > > >>>> I'm developing target wrapper for HCatOutputFormat, which uses a > > custom > > >>>> OutputCommiter to get results committed to hive. It seems its > > >>>> OutputCommitter is not called at all. Looking into the code, I can't > > > find > > >>>> where crunch calls it. Is it really supported? > > >>>> > > >>>> Thanks, > > >>>> Chao > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Director of Data Science > > >>> Cloudera > > >>> Twitter: @josh_wills > > > --047d7bdc05382b2fbf04f36e2dfa--