hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingzhong Xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3738) Bugs exist in SEMI JOIN
Date Thu, 22 Nov 2012 07:57:09 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yingzhong Xu updated HIVE-3738:
-------------------------------

    Summary: Bugs exist in SEMI JOIN  (was: Bugs exists in SEMI JOIN)
    
> Bugs exist in SEMI JOIN
> -----------------------
>
>                 Key: HIVE-3738
>                 URL: https://issues.apache.org/jira/browse/HIVE-3738
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0
>         Environment: JDK1.6
>            Reporter: Yingzhong Xu
>              Labels: Semijoin
>         Attachments: DDL
>
>
> I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:
> Here is a simple query(works correctly):
> *Q1*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN(
>   SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
>  ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
> {quote}
> it means inserting the key of customers who has orders since 1995-12-31 into another
table.
> But if I write the query like this:
> *Q2*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN ORDERS
>  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
>  AND unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> I will get exception from Hive:
> {quote}
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {quote}
> Also,If I write the query like this:
> *Q3*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics 
>  SELECT C_CUSTKEY FROM CUSTOMER 
>  LEFT SEMI JOIN ORDERS
>  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
>  WHERE unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> Then this query can be executed(wondering the right hand of SEMI JOIN can be referenced
in WHERE clause now?), but the result is wrong(comparing to *Q1*, *Q1*'s result is the same
with mysql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message