pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aniket Mokashi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
Date Tue, 24 Sep 2013 23:24:03 GMT

    [ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776910#comment-13776910

Aniket Mokashi commented on PIG-2672:

[~cheolsoo], thanks for your comments. I will work on the patch to make it more production
ready. I have tried it on a simple job, but not in production yet.

[~knoguchi], I do not understand your concern here. Currently jars get copied to /tmp/temp-<random>/
which can be written by all users. I do not see how jar cache is less secure than the current
approach. In fact, any misconfiguration is still protected by SHA (hard to collide).

I do not see any benefit of restricting to use /user/<username>/.pig as its not mandatory
to have that directory secure for users (Am I right?). If you look closely, cluster cache
and user cache have exactly similar behavior. The only reason we have two is for easy configuration
and better dedup of jars across the cluster.

> Optimize the use of DistributedCache
> ------------------------------------
>                 Key: PIG-2672
>                 URL: https://issues.apache.org/jira/browse/PIG-2672
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Aniket Mokashi
>             Fix For: 0.12.0
>         Attachments: PIG-2672.patch
> Pig currently copies jar files to a temporary location in hdfs and then adds them to
DistributedCache for each job launched. This is inefficient in terms of 
>    * Space - The jars are distributed to task trackers for every job taking up lot of
local temporary space in tasktrackers.
>    * Performance - The jar distribution impacts the job launch time.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message