hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
Date Mon, 12 Jul 2010 06:50:51 GMT

    [ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887283#action_12887283
] 

Ashutosh Chauhan commented on PIG-1249:
---------------------------------------

Map-reduce framework has a jira related to this issue.  https://issues.apache.org/jira/browse/MAPREDUCE-1521
It has two implications for Pig:

1) We need to reconsider whether we still want Pig to set number of reducers on user's behalf.
We can choose not to "intelligently" choose # of reducers and let framework fail the  job
which doesn't "correctly" specify # of reducers. Then, Pig is out of this guessing game and
users are forced by framework to correctly specify # of reducers. 

2) Now that MR framework will fail the job based on configured limits, operators where Pig
does compute and set number of reducers (like skewed join etc.) should now be aware of those
limits so that # of reducers computed by them fall within those limits.

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> ----------------------------------------------------------------------
>
>                 Key: PIG-1249
>                 URL: https://issues.apache.org/jira/browse/PIG-1249
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Arun C Murthy
>            Assignee: Jeff Zhang
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch
>
>
> It would be *very* useful for Pig to have safe-guards against naive scripts which process
a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge data-sets (>10TB)
with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message