pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park" <piaozhe...@gmail.com>
Subject Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up
Date Wed, 26 Feb 2014 19:33:39 GMT


> On Feb. 26, 2014, 7:28 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java,
lines 302-303
> > <https://reviews.apache.org/r/18525/diff/1/?file=504701#file504701line302>
> >
> >     Will there be a problem if schema had a bag or map?

No. As far as I understand, tuple is only a problem because UDF has to wrap them in a tuple
if multiple fields are returned.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35554
-----------------------------------------------------------


On Feb. 26, 2014, 6:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
> 
> (Updated Feb. 26, 2014, 6:15 p.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3679
>     https://issues.apache.org/jira/browse/PIG-3679
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk.
To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values
are handled in some cases. In particular, some UDFs that used to see no nulls are called with
nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc
to filter out nulls always. Technically, this still changes the behavior with nulls, but it
seems ok that UDFs that used to fail with NPE no longer fail.
> 
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
> 
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me,
but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no
longer works with 0.13").
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
f031b1d 
> 
> Diff: https://reviews.apache.org/r/18525/diff/
> 
> 
> Testing
> -------
> 
> All e2e tests pass (except Warning_4 PIG-3739).
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message