pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5312) Uids not set in inner schemas after UNION ONSCHEMA
Date Thu, 21 Dec 2017 20:45:05 GMT

    [ https://issues.apache.org/jira/browse/PIG-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300541#comment-16300541
] 

Koji Noguchi commented on PIG-5312:
-----------------------------------

Sorry for such a delay in review.  Took me time to understand the original LOUnion.java (before
this patch). :)

Patch looks good.  Only question I have to myself is,
{code}
168                     if (inputFieldSchema.schema != null) {
169                         fieldInputSchemas.add(inputFieldSchema.schema);
170                     } 
...
203             if (outputFieldSchema.schema != null) {
204                 setMergedSchemaUids(outputFieldSchema.schema, fieldInputSchemas);
205             }
{code}
For line 168 and 203 where we check for null schema, do we also need to check for {{DataType.isSchemaType(type)}}
?  
I've seen places where we do put those checks but felt they were redundant. 

I'll wait 1-2 days before committing in case Rohini or Daniel has feedback.

> Uids not set in inner schemas after UNION ONSCHEMA
> --------------------------------------------------
>
>                 Key: PIG-5312
>                 URL: https://issues.apache.org/jira/browse/PIG-5312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.16.0, 0.17.0
>            Reporter: Travis Woodruff
>            Assignee: Travis Woodruff
>         Attachments: PIG-5312.patch
>
>
> Ran into a failure with a script of the form:
> {code}
> u = union onschema x, y;  -- schema: (a, b: {(m:int, n: chararray)})
> z = foreach u {
>     i = foreach b generate m + 5;
>     generate a, i;
> }
> {code}
> The issue ended up being that {{LOUnion}} is not setting uids on inner schemas. This
means that uids on inner schema fields are all -1, so when {{ProjectExpression.getFieldSchema()}}
tries to look up the fields in the inner select from the inner schema, all the fields match,
and the last field's schema ends up being returned. In the example above this causes {{TypeCheckingExpVisitor.addCastsToNumericBinExpression()}}
to fail for the addition operator (since the returned field schema is a chararray).
> This only seems to affect the schema, so I don't think this should cause bad data to
be produced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message