Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59322986D for ; Tue, 7 Feb 2012 19:05:22 +0000 (UTC) Received: (qmail 81099 invoked by uid 500); 7 Feb 2012 19:05:21 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 80786 invoked by uid 500); 7 Feb 2012 19:05:21 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 80629 invoked by uid 500); 7 Feb 2012 19:05:20 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 80626 invoked by uid 99); 7 Feb 2012 19:05:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Feb 2012 19:05:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Feb 2012 19:05:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6F6081A7528 for ; Tue, 7 Feb 2012 19:04:59 +0000 (UTC) Date: Tue, 7 Feb 2012 19:04:59 +0000 (UTC) From: "Daniel Dai (Commented) (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <886913891.9576.1328641499457.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1586170068.2447.1328545323768.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202633#comment-13202633 ] Daniel Dai commented on PIG-2508: --------------------------------- I will take a look. > PIG can unpredictably ignore deprecated Hadoop config options > ------------------------------------------------------------- > > Key: PIG-2508 > URL: https://issues.apache.org/jira/browse/PIG-2508 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.2, 0.10 > Reporter: Anupam Seth > Assignee: Thomas Weise > Priority: Blocker > Fix For: 0.10, 0.9.3 > > Attachments: PIG-2508.3.patch, PIG-2508.patch > > > When deprecated config options are passed to a Pig job, it can unpredictably ignore them and override them with values provided in the defaults due to a "race condition"-like issue. > This problem was first noticed as part of MAPREDUCE-3665, which was re-filed as HADOOP-7993 so as for it to fall in the right component bucket of the code being fixed. This JIRA fixed the bug on the Hadoop side of the code that caused older deprecated config options to be ignored when they were also specified in the defaults xml file with the newer config name or vice versa. > However, the problem seemed to persist with Pig jobs and HADOOP-8021 was filed to address the issue. > A careful step-by-step execution of the code in a debugger reveals an second overlapping bug because of the way PIG is dealing with the configs. > Not sure how / why this was not seen earlier, but the code in HExecutionEngine.java#recomputeProperties currently mashes together the default Hadoop configs and the user-specified properties into a Properties object. Given that it uses a HashTable to store the properties, if we have a config called "old.config.name" which is now deprecated and replaced by "new.config.name" and if one type is specified in the defaults and another by the user, we get a strange condition in which the repopulated Properties object has [in an unpredictable ordering] the following: > {code} > config1.name=config1.value > config2.name=config2.value > ... > old.config.name=old.config.value > ... > new.config.name=new.config.value > ... > configx.name=configx.value > {code} > When this Properties object gets converted into a Configuration object by the ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and tries to resolve all old configs. Because the ordering is not guaranteed (and because in the case of compress, the hash function consistently gives the new config loaded from the defaults after the old one), the user-specified config is ignored in favor of the default config (which from the point of view of the Hadoop Configuration object is expected standard behavior to replace an earlier specification of a config value with a later one). > The fix for this is probably straightforward, but will require a re-write of the a chunk of code in HExecutionEngine.java. Instead of mashing together a JobConf object and a Properties object into a Configuration object that is finally re-converted into a JobConf object, the code simply needs to consistently and correctly populate a JobConf / Configuration object that can handle deprecation instead of a "dumb" Java Properties object. > We recently saw another potential occurrence of this bug where Pig seems to honor only mapreduce.job.queuename parameter for specifying queue name and ignores the parameter mapred.job.queue.name. > Since this can break a lot of existing jobs that run fine on 0.20, marking this as a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira