hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
Date Tue, 23 Jun 2015 01:27:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596972#comment-14596972
] 

Hive QA commented on HIVE-10996:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741134/HIVE-10996.09.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9014 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4341/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4341/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4341/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741134 - PreCommit-HIVE-TRUNK-Build

> Aggregation / Projection over Multi-Join Inner Query producing incorrect results
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-10996
>                 URL: https://issues.apache.org/jira/browse/HIVE-10996
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>            Reporter: Gautam Kowshik
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch,
HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch,
HIVE-10996.patch, explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
>     select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp
>     from (select * from purchase_history) purchase
>     join (select * from cart_history) mevt
>     on purchase.s = mevt.s
>     where purchase.timestamp > mevt.timestamp
>     group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
>     select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp
>     from (select * from purchase_history) purchase
>     join (select * from cart_history) mevt
>     on purchase.s = mevt.s
>     where purchase.timestamp > mevt.timestamp
>     group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1	21	20	Bob	1234
> 1	31	30	Bob	1234
> 3	51	50	Jeff	1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context but what
lead us to this issue was select count(distinct s) was not returning results. The above queries
are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that the issue
does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes this issue.
This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message