hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingzhong Xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3738) Bugs exists in SEMI JOIN
Date Thu, 22 Nov 2012 07:55:01 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yingzhong Xu updated HIVE-3738:
-------------------------------

    Attachment: DDL

The DDL helps you create related tables
                
> Bugs exists in SEMI JOIN
> ------------------------
>
>                 Key: HIVE-3738
>                 URL: https://issues.apache.org/jira/browse/HIVE-3738
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0
>         Environment: JDK1.6
>            Reporter: Yingzhong Xu
>              Labels: Semijoin
>         Attachments: DDL
>
>
> I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:
> Here is a simple query(works correctly):
> *Q1*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN(
>   SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
>  ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
> {quote}
> it means inserting the key of customers who has orders since 1995-12-31 into another
table.
> But if I write the query like this:
> *Q2*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN ORDERS
>  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
>  AND unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> I will get exception from Hive:
> {quote}
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {quote}
> Also,If I write the query like this:
> *Q3*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN ORDERS
>  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
>  WHERE unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> Then this query can be executed(wondering the right hand of SEMI JOIN can be referenced
in WHERE clause now?), but the result is wrong(comparing to Q1, Q1's result is the same with
mysql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message