pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3861) duplicate jars get added to distributed cache
Date Thu, 03 Apr 2014 19:59:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959140#comment-13959140
] 

Rohini Palaniswamy commented on PIG-3861:
-----------------------------------------

Few comments:
  - Please convert to Set for skipJars, extraJars, etc as well
  - The changes to shipToHDFS is very bad and adds FS calls and also logic is flimsy if user
passes jars/files via distributedcache like in Oozie. Please revert it and check the conf
to see if DistributedCache already has that file. Take into account symlinks while doing that.

  - TestJobControlCompiler.java - Add an assert to check that it is in DistributedCache only
once. 

> duplicate jars get added to distributed cache
> ---------------------------------------------
>
>                 Key: PIG-3861
>                 URL: https://issues.apache.org/jira/browse/PIG-3861
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>            Priority: Minor
>         Attachments: PIG-3681-1.patch
>
>
> PigContext's scriptJars should handle de-duplication of jars to account for script engines
e.g. JythonScriptEngine performing various jar loading for module and sometimes adding same
jar twice. AlsoJobControlCompiler.shipToHdfs() needs a check against adding the same jar more
than once, under different randomly incremented sub-dirs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message