pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options
Date Tue, 07 Feb 2012 19:04:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202633#comment-13202633
] 

Daniel Dai commented on PIG-2508:
---------------------------------

I will take a look.
                
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
>                 Key: PIG-2508
>                 URL: https://issues.apache.org/jira/browse/PIG-2508
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Anupam Seth
>            Assignee: Thomas Weise
>            Priority: Blocker
>             Fix For: 0.10, 0.9.3
>
>         Attachments: PIG-2508.3.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably ignore them
and override them with values provided in the defaults due to a "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed as HADOOP-7993
so as for it to fall in the right component bucket of the code being fixed. This JIRA fixed
the bug on the Hadoop side of the code that caused older deprecated config options to be ignored
when they were also specified in the defaults xml file with the newer config name or vice
versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was filed to address
the issue. 
> A careful step-by-step execution of the code in a debugger reveals an second overlapping
bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in HExecutionEngine.java#recomputeProperties
currently mashes together the default Hadoop configs and the user-specified properties into
a Properties object. Given that it uses a HashTable to store the properties, if we have a
config called "old.config.name" which is now deprecated and replaced by "new.config.name"
and if one type is specified in the defaults and another by the user, we get a strange condition
in which the repopulated Properties object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the ConfigurationUtil#toConfiguration()
routine, the deprecation kicks in and tries to resolve all old configs. Because the ordering
is not guaranteed (and because in the case of compress, the hash function consistently gives
the new config loaded from the defaults after the old one), the user-specified config is ignored
in favor of the default config (which from the point of view of the Hadoop Configuration object
is expected standard behavior to replace an earlier specification of a config value with a
later one).
> The fix for this is probably straightforward, but will require a re-write of the a chunk
of code in HExecutionEngine.java. Instead of mashing together a JobConf object and a Properties
object into a Configuration object that is finally re-converted into a JobConf object, the
code simply needs to consistently and correctly populate a JobConf / Configuration object
that can handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to honor only
mapreduce.job.queuename parameter for specifying queue name and ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking this as a
blocker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message