pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4184) UDF backward compatibility issue after POStatus.STATUS_NULL refactory
Date Fri, 10 Oct 2014 21:21:33 GMT

    [ https://issues.apache.org/jira/browse/PIG-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167531#comment-14167531
] 

Rohini Palaniswamy commented on PIG-4184:
-----------------------------------------

[~daijy], 
   if (input == null || input.size() == 0) is in way more UDF classes in piggybank and builtin
than the classes you just changed. Wouldn't all those UDFs and user UDFs doing the same break?


> UDF backward compatibility issue after POStatus.STATUS_NULL refactory
> ---------------------------------------------------------------------
>
>                 Key: PIG-4184
>                 URL: https://issues.apache.org/jira/browse/PIG-4184
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.14.0
>
>         Attachments: PIG-4184-1.patch
>
>
> This is the same issue we discussed in PIG-3739 and PIG-3679. However, our previous fix
does not solve the issue, in fact, it make things worse and it is totally my fault.
> Consider the following UDF and script:
> {code}
>     public class IntToBool extends EvalFunc<Boolean> {
>         @Override
>         public Boolean exec(Tuple input) throws IOException {
>             if (input == null || input.size() == 0)
>                 return null;
>             Integer val = (Integer)input.get(0);
>             return (val == null || val == 0) ? false : true;
>         }
>     }
> {code}
> {code}
> a = load '1.txt' as (i0:int, i1:int);
> b = foreach a generate IntToBool(i0);
> store b into 'output';
> {code}
> 1.txt
> {code}
> 1
> 2   3
> {code}
> With Pig 0.12, we get:
> {code}
> (false)
> (true)
> {code}
> With Pig 0.13/0.14, we get:
> {code}
> ()
> (true)
> {code}
> The reason is in 0.12, Pig pass first row as a tuple with a null item to IntToBool, with
0.13/0.14, Pig swallow the first row, which is not right. And this wrong behavior is brought
by PIG-3739 and PIG-3679.
> Before that (but after POStatus.STATUS_NULL refactory PIG-3568), we do have a behavior
change which makes e2e test StreamingPythonUDFs_10 fail with NPE. However, I think this is
an inconsistent behavior of 0.12. Consider the following scripts:
> {code}
> a = load '1.txt' as (name:chararray, age:int, gpa:double);
> b = foreach a generate ROUND((gpa>3.0?gpa+1:gpa));
> store b into 'output';
> {code}
> {code}
> a = load '1.txt' as (name:chararray, age:int, gpa:double);
> b = foreach a generate ROUND(gpa);
> store b into 'output';
> {code}
> If gpa field is null, script 1 skip the row and script 2 fail with NPE, which does not
make sense. So my thinking is:
> 1. Pig 0.12 is wrong and POStatus.STATUS_NULL refactory fix this behavior (we don't need
related fix in PIG-3739/PIG-3679)
> 2. ROUND (and some other UDF) is wrong anyway, we shall fix it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message