hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1218) Use distributed cache to store samples
Date Thu, 11 Feb 2010 00:28:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832320#action_12832320
] 

Richard Ding commented on PIG-1218:
-----------------------------------

There is no hard limit on file size for DistributedCache. The files in the DistributedCache
are copied to all nodes before the job starts. So the large files will impact the performance
due to the transmission of files to all nodes.

> Use distributed cache to store samples
> --------------------------------------
>
>                 Key: PIG-1218
>                 URL: https://issues.apache.org/jira/browse/PIG-1218
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.7.0
>
>         Attachments: PIG-1218.patch
>
>
> Currently, in the case of skew join and order by we use sample that is just written to
the dfs (not distributed cache) and, as the result, get opened and copied around more than
necessary. This impacts query performance and also places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message