hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Tipan Verella (JIRA)" <>
Subject [jira] [Created] (HIVE-7555) inner join is being resolves as cartesian product
Date Wed, 30 Jul 2014 18:36:41 GMT
J. Tipan Verella created HIVE-7555:

             Summary: inner join is being resolves as cartesian product
                 Key: HIVE-7555
             Project: Hive
          Issue Type: Bug
         Environment: CentOS
            Reporter: J. Tipan Verella

I believe this is a bug, because I do not seem to be able to find a way around the following
stackoverflow question,

The issue is as follows (repeated from SO for convenience).
This is type of query I am sending to HIVE:

    SELECT BigTable.nicefield,LargeTable.* 
    FROM LargeTable INNER JOIN BigTable 
        ON (
            LargeTable.joinfield1of4 = BigTable.joinfield1of4 
            AND LargeTable.joinfield2of4 = BigTable.joinfield2of4 
    WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15 AND BigTable.joinfield3of4=20140726
AND BigTable.joinfield4of4=15
        AND LargeTable.filterfiled1of2=123456
        AND LargeTable.filterfiled2of2=98765
        AND LargeTable.joinfield2of4=12 
        AND LargeTable.joinfield1of4='iwanttolikehive'       

It returns `2418025` rows.  The issue is that 

    SELECT *  
    FROM LargeTable 
    WHERE joinfield3of4=20140726 AND joinfield4of4=15
        AND filterfiled1of2=123456 
        AND filterfiled2of2=98765
        AND joinfield2of4=12 
        AND joinfield1of4='iwanttolikehive'

returns `1555` rows, and so does:

    SELECT *  
    FROM BigTable 
    WHERE joinfield3of4=20140726 AND joinfield4of4=15
        AND joinfield2of4=12 
        AND joinfield1of4='iwanttolikehive'

Note that **1555^2 = 2418025**.

Feel free to discard this issue if it is not a bug, but please provide a solution on SO.

Thank you.

This message was sent by Atlassian JIRA

View raw message