pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Incorrect FLATTEN behavior
Date Fri, 09 Dec 2011 20:58:36 GMT
Hi guys,
I am running into a behavior of flatten that causes pretty significant bugs
in certain corner cases.
Not sure whether fixing it will cause any backwards-incompatibility issues,
so looking for your feedback.

Here's the issue:

Flattening a null bag results in the row being dropped. That's fine.

Filtering a null tuple results in a single column (with the value null)
being produced.

That leads to all the columns after the flattened value shifting left by
n-1 positions, where n is the number of expected fields in a tuple!

Consider:

grunt> sh cat tmp/x
foo bar
a (b,c) d
grunt> x = load 'tmp/x' as (a:chararray, b:(b:chararray, c:chararray),
d:chararray);
grunt> projected = foreach x generate d;
grunt> dump projected
*(bar)
*(d)

grunt> flattened = foreach x generate a, flatten(b) as (b, c), d;
grunt> dump flattened
*(foo,,bar) * -- NOTE THREE FIELDS INSTEAD OF EXPECTED 4
(a,b,c,d)
grunt> projected = foreach flattened generate d;
grunt> dump projected
*()  *-- NOTE WRONG VALUE
(d)
grunt> projected = foreach flattened generate c;
*() *-- NOTE THAT, INCONSISTENTLY, C is NULL! AS IS B.
(c)

I've reproduced this behavior in pig 8 and pig 9 (top of branch)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message