pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5201) Null handling on FLATTEN
Date Tue, 20 Jun 2017 21:11:09 GMT

    [ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056468#comment-16056468
] 

Koji Noguchi commented on PIG-5201:
-----------------------------------

bq.  Have asked Koji Noguchi to check with couple of internal users who are both Pig and data
pipeline experts and will be affected by this.

>From the users, learned that there's a common pattern users use which can easily break
when FLATTEN(null-bag) start dropping records as I proposed... 

Basically their code looks like
{code}
...
C = FOREACH B GENERATE record_type, FLATTEN(type_a_bag), FLATTEN(type_b_bag); 
...
{code}
When record_type is 'a', type_b_bag is null, and vice-versa. 
Instead of checking the record_type up-front, user simply flatten both and later examine the
record_type.

I hate inconsistency and I hate being wrong (and Rohini being right), but it looks like I
would have to keep the current behavior of FLATTEN(null-bag) _not_ dropping.  

> Null handling on FLATTEN
> ------------------------
>
>                 Key: PIG-5201
>                 URL: https://issues.apache.org/jira/browse/PIG-5201
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, pig-5201-v02.patch,
pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message