hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1333) API interface to Pig
Date Mon, 07 Jun 2010 01:23:56 GMT

    [ https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876105#action_12876105
] 

Olga Natkovich commented on PIG-1333:
-------------------------------------

Patch looks good. A few comments and questions:

(1) General comment. This patch is very large and combines multiple different issues that
could have been separated into multiple patches to make it easier to review and test
(2) We are missing script level feature collection. (I see the one at job level.) For each
script, we want to collect overall script features such as different operators: join, order
by, etc., is it a multiquery, does it have UDF. Also, we would want to know if combiner was
used and whether the script spilled but maybe both of those can be at the job level.
(3) We need to add separate comment to the JIRA marked as documentation that describes PigRunner
since it is a new interface that we need to include in 0.8.0 documentation.
(4) MapReduceLauncher. Why was exception handling and temp store handling code removed?
(5) OutputStats assumes that location is a path which might not be true for non-file stores.
(6) ScriptState: There are maps/hashes optimized for enums (http://java.sun.com/j2se/1.5.0/docs/api/java/util/EnumMap.html)
(7) Why JobStats is derived from an operator?
(8) Why did JOB_NAME_PREFIX got removed from PigContext?
(9) Why do we need to synchronize getTemporaryFile?

> API interface to Pig
> --------------------
>
>                 Key: PIG-1333
>                 URL: https://issues.apache.org/jira/browse/PIG-1333
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1333.patch
>
>
> It would be nice to make Pig more friendly for applications like workflow that would
be executing pig scripts on user behalf.
> Currently, they would have to use pig command line to execute the code; however, this
has limitation on the kind of output that would be delivered. For instance, it is hard to
produce error information that is easy to use programatically or collect statistics.
> The proposal is to create a class that mimics the behavior of the Main but gives users
a status object back. The the main code of pig would look somethig like:
> public static void main(String args[])
> {
>     PigStatus ps = PigMain.exec(args);
>     exit (PigStatus.rc);
> }
> We need to define the following:
> - Content of PigStatus. It should at least include
>    * return code
>    * error string
>    * exception 
>    * statistics
> - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message