pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Woodruff (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4018) Schema validation fails with UNION ONSCHEMA
Date Fri, 20 Jun 2014 19:41:25 GMT

     [ https://issues.apache.org/jira/browse/PIG-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Travis Woodruff updated PIG-4018:

    Attachment: PIG-4018-2.patch

Looks like TypeCheckingRelVisitor adds type conversion, so my previous patch is no good. Need
a more complicated patch, unfortunately.

This patch updates LOUnion.getSchema() to ensure that all output schema fields have a uid
set. It also makes some small changes to minimize calls to getSchema() on inputs, since several
additional iterations over inputs have been added, and getScema() calls can be quite costly
(for example, PigStorage reloads from HDFS every time getSchema() is called).

Also moved the test for this issue into TestUnionOnSchema.

> Schema validation fails with UNION ONSCHEMA
> -------------------------------------------
>                 Key: PIG-4018
>                 URL: https://issues.apache.org/jira/browse/PIG-4018
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Travis Woodruff
>            Assignee: Travis Woodruff
>         Attachments: PIG-4018-2.patch, PIG-4018.patch
> When relations with differing schemas are unioned (using UNION ONSCHEMA), schema validation
can fail with this exception:
> {{org.apache.pig.impl.plan.PlanValidationException: Logical plan invalid state: invalid
uid -1 in schema}}
> This worked before the fix for PIG-3492.
> The merged schema (from {{LOUnion.getSchema()}}) does not contain uids for columns not
in the schema of the first input (uids are set to -1). This is because only the first input's
schema is used for looking up "cached" uids.
> Normally, this isn't a problem because {{UnionOnSchemaSetter}} comes along and fixes
the missing fields.
> However, when {{ImplicitSplitInsertVisitor}} is active, it is called before {{UnionOnSchemaSetter}}.
{{ImplicitSplitInsertVisitor}} calls {{schemaResetter.visit()}}, which throws the validation
exception because {{UnionOnSchemaSetter}} has not had a chance to create the missing fields
(and thus uids are still -1 for these fields).

This message was sent by Atlassian JIRA

View raw message