Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 400D3CE5C for ; Tue, 14 Aug 2012 02:09:39 +0000 (UTC) Received: (qmail 93610 invoked by uid 500); 14 Aug 2012 02:09:38 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 93499 invoked by uid 500); 14 Aug 2012 02:09:38 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 93487 invoked by uid 500); 14 Aug 2012 02:09:38 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 93481 invoked by uid 99); 14 Aug 2012 02:09:38 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Aug 2012 02:09:38 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1A6332C5ACE for ; Tue, 14 Aug 2012 02:09:38 +0000 (UTC) Date: Tue, 14 Aug 2012 13:09:38 +1100 (NCT) From: "Bill Graham (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <1510518732.5635.1344910178108.JavaMail.jiratomcat@arcas> In-Reply-To: <526298469.47045.1331346297784.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (PIG-2578) Multiple Store-commands mess up mapred.output.dir. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433834#comment-13433834 ] Bill Graham commented on PIG-2578: ---------------------------------- Regarding the wrapper job conf, in some cases I'm sure it's justified to set a conf. What if we throw an exception if a value set attempt occurs where a different value already exists? We could include messaging about how UDFContext if probably what they want. This approach would be backward compatible with jobs that use conf properly with a single-store job, for example. > Multiple Store-commands mess up mapred.output.dir. > -------------------------------------------------- > > Key: PIG-2578 > URL: https://issues.apache.org/jira/browse/PIG-2578 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.1, 0.9.2 > Reporter: Mithun Radhakrishnan > Assignee: Daniel Dai > Fix For: 0.10.0, 0.11 > > Attachments: PIG-2578-1.patch > > > When one runs a pig-script with multiple storers, one sees the following: > 1. When run as a script, Pig launches a single job. > 2. PigOutputCommitter::setupJob() calls the underlyingOutputCommitter::setupJob(), once for each storer. But the mapred.output.dir is the same for both calls, even though the storers write to different locations. > This was originally seen in HCATALOG-276, when HCatalog's end-to-end tests are run against Pig. > (https://issues.apache.org/jira/browse/HCATALOG-276) > Sample pig-script (near identical to HCatalog's Pig_Checkin_4 test): > a = load 'keyvals' using org.apache.hcatalog.pig.HCatLoader(); > split a into b if key<200, c if key >=200; > store b into 'keyvals_lt200' using org.apache.hcatalog.pig.HCatStorer(); > store c into 'keyvals_ge200' using org.apache.hcatalog.pig.HCatStorer(); > I've suggested a workaround in HCat for the time being, but I think this might be something that needs fixing in Pig. > Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira