pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dai <da...@hortonworks.com>
Subject Re: Incorrect FLATTEN behavior
Date Sat, 10 Dec 2011 00:37:30 GMT
It is a confusing result and I certainly vote to fixed. This is the
correctness issue, so bring the right behavior is more important than some
minor backward-compatibility.

Daniel

On Fri, Dec 9, 2011 at 12:58 PM, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:

> Hi guys,
> I am running into a behavior of flatten that causes pretty significant bugs
> in certain corner cases.
> Not sure whether fixing it will cause any backwards-incompatibility issues,
> so looking for your feedback.
>
> Here's the issue:
>
> Flattening a null bag results in the row being dropped. That's fine.
>
> Filtering a null tuple results in a single column (with the value null)
> being produced.
>
> That leads to all the columns after the flattened value shifting left by
> n-1 positions, where n is the number of expected fields in a tuple!
>
> Consider:
>
> grunt> sh cat tmp/x
> foo bar
> a (b,c) d
> grunt> x = load 'tmp/x' as (a:chararray, b:(b:chararray, c:chararray),
> d:chararray);
> grunt> projected = foreach x generate d;
> grunt> dump projected
> *(bar)
> *(d)
>
> grunt> flattened = foreach x generate a, flatten(b) as (b, c), d;
> grunt> dump flattened
> *(foo,,bar) * -- NOTE THREE FIELDS INSTEAD OF EXPECTED 4
> (a,b,c,d)
> grunt> projected = foreach flattened generate d;
> grunt> dump projected
> *()  *-- NOTE WRONG VALUE
> (d)
> grunt> projected = foreach flattened generate c;
> *() *-- NOTE THAT, INCONSISTENTLY, C is NULL! AS IS B.
> (c)
>
> I've reproduced this behavior in pig 8 and pig 9 (top of branch)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message