hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-787) Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache
Date Thu, 22 Jul 2010 23:02:51 GMT

    [ https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891383#action_12891383
] 

Richard Ding commented on PIG-787:
----------------------------------

Currently, Pig bundles UDFs and their dependencies (including pig.jar) into job.jar and sends
it to the job track via jobconf. Hadoop then copies the jar to its hdfs and pushes it to all
the nodes. This is essentially the same as using distributed cache (but Pig doesn't need to
copy the jar to hdfs).

One use case of using distributed cache is that some UDF jars are already on hdfs. In this
case, instead of adding them to job.jar, Pig can directly add them to Hadoop's distributed
cache. This will reduce the size of job.jar and avoid copying those jars to hdfs again.

Is there any other use cases that distributed cache will be helpful to distribute UDFs and
their dependencies? 

> Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-787
>                 URL: https://issues.apache.org/jira/browse/PIG-787
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message