hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys
Date Tue, 30 Sep 2014 21:22:33 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lin Liu updated HIVE-8312:
--------------------------
    Description: 
Suppose we have a query as follows.
"
SELECT ....
FROM A LEFT SEMI JOIN B
ON (A.col1 = B.col2)
WHERE ...
"
If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit
type conversion.

However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could
further
generate unexpected results, like skew.

Why do we always convert to NUMERIC type when both columns are in different type groups? 
Do we expect the corresponding exceptions happen?

  was:
Suppose we have a query as follows.
"
SELECT ....
FROM A LEFT SEMI JOIN B
ON (A.col1 = B.col2)
WHERE ...
"
If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit
type conversion.

However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could
further
generate unexpected results, like skew.

Why do we always convert to NUMERIC type? Any rationale here? If not expected, how should
we handle it?


> Implicit type conversion on Join keys
> -------------------------------------
>
>                 Key: HIVE-8312
>                 URL: https://issues.apache.org/jira/browse/HIVE-8312
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Lin Liu
>
> Suppose we have a query as follows.
> "
> SELECT ....
> FROM A LEFT SEMI JOIN B
> ON (A.col1 = B.col2)
> WHERE ...
> "
> If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE,
> Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit
type conversion.
> However, this implicit conversion from STRING to DOUBLE could produce NULL values, which
could further
> generate unexpected results, like skew.
> Why do we always convert to NUMERIC type when both columns are in different type groups?
 Do we expect the corresponding exceptions happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message