pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
Date Mon, 24 May 2010 22:58:25 GMT

    [ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870896#action_12870896

Alan Gates commented on PIG-1249:


# In this code, what happens if a loader is not loading from a file (like an HBase loader)?
 It looks to me like it will end up throwing an IOException when it tries to stat the 'file'
which won't exist and that will cause Pig to die.  Ideally in this case it should decide that
it cannot make a rational estimate and not try to estimate.
# I'm curious where the values of ~1GB per reducer and 999 reducers came from.
# Does this estimate apply only to the first job or to all jobs?
# How does this work in the case of joins, where there are multiple inputs to a job?

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> ----------------------------------------------------------------------
>                 Key: PIG-1249
>                 URL: https://issues.apache.org/jira/browse/PIG-1249
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Arun C Murthy
>            Assignee: Jeff Zhang
>            Priority: Critical
>             Fix For: 0.8.0
>         Attachments: PIG-1249.patch, PIG_1249_2.patch
> It would be *very* useful for Pig to have safe-guards against naive scripts which process
a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge data-sets (>10TB)
with badly mis-configured #reduces e.g. 1 reduce. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message