hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1427) Monitor and kill runaway UDFs
Date Thu, 27 May 2010 17:42:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872303#action_12872303
] 

Ashutosh Chauhan commented on PIG-1427:
---------------------------------------

1. You didnt pay heed to my request for incrementing counter when udf times out or throws
an exception :) I think that will be pretty useful for user to know how many faulty records
there are in the dataset which can't be processed by the UDF.
2. In the getDefaultValue() there seems to be a inconsistency among different if statements.
I guess you need to make a distinction between Integer[] and Integer return type and then
return appropriate return value.
3. Doing svn co; patch -p0 < monitoredUDF.patch; ant jar results in build failure. It seems
ivy is not pulling guava lib.
4. Since its user facing new interface, having stability/visibility tag would really be useful.
5. Since it spawns a new thread for every exec() call, I assume it will have some overhead.
If you have done some comparison or have numbers for that, it will be great if you can share
that.

> Monitor and kill runaway UDFs
> -----------------------------
>
>                 Key: PIG-1427
>                 URL: https://issues.apache.org/jira/browse/PIG-1427
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: monitoredUdf.patch, monitoredUdf.patch
>
>
> As a safety measure, it is sometimes useful to monitor UDFs as they execute. It is often
preferable to return null or some other default value instead of timing out a runaway evaluation
and killing a job. We have in the past seen complex regular expressions lead to job failures
due to just half a dozen (out of millions) particularly obnoxious strings.
> It would be great to give Pig users a lightweight way of enabling UDF monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message