hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3544) union involving double column with a map join subquery will fail or give wrong results
Date Wed, 17 Oct 2012 20:38:03 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478346#comment-13478346
] 

Kevin Wilfong commented on HIVE-3544:
-------------------------------------

I tracked the problem down to two causes.

1) In the genUnionPlan, when preparing the ColumnInfo objects to be used to generate the RowResolver
for the Union operator, it actually changes the ColumnInfo objects of the left operator's
RowResolver to have the "common class" as its type. This would cause it to get serialized
wrong in the intermediate FileSink operator between map reduce jobs (as was the case when
the left subquery of the union involved a join).

2) The common class for a column of the Union operator would get determined once at compile
time and again later at run time using different functions which could return different classes
(for instance when the type on one side was a double and on the other it was a string). This
caused the union operator to return objects with a different type from what the RowResolver
specified causing serialization errors/failures.

To fix 1) I added the ability to clone a ColumnInfo, and in the SemanticAnalyzer the left
operator's ColumnInfo objects are now cloned before being modified.

To fix 2) I added Select operators between the input operators and the union operator. These
select operators cast the input columns to the types determined at compile time if they do
not match, otherwise they simply forward the value. Now the conversion in the union operator
is only needed to alter the the type of the ObjectInspector, not the type of the column.
                
> union involving double column with a map join subquery will fail or give wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3544
>                 URL: https://issues.apache.org/jira/browse/HIVE-3544
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-3581.1.patch.txt
>
>
> The following query fails:
> select * from (select cast(a.key as bigint) as key from src a join src b on a.key = b.key
union all select cast(key as double) as key from src)a
> The following query gives wrong results:
> select * from (select cast(a.key as bigint) as key, cast(b.key as double) as value from
src a join src b on a.key = b.key union all select cast(key as double) as key, cast(key as
string) as value from src)a
> But the following query runs fine:
> select * from (select cast(a.key as bigint) as key from src a union all select cast(key
as double) as key from src)a

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message