Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F884183E3 for ; Mon, 22 Jun 2015 20:14:02 +0000 (UTC) Received: (qmail 11295 invoked by uid 500); 22 Jun 2015 20:14:02 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 11275 invoked by uid 500); 22 Jun 2015 20:14:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 11264 invoked by uid 99); 22 Jun 2015 20:14:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 20:14:02 +0000 Date: Mon, 22 Jun 2015 20:14:02 +0000 (UTC) From: "Laljo John Pullokkaran (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596562#comment-14596562 ] Laljo John Pullokkaran commented on HIVE-10996: ----------------------------------------------- Some more comments: 1. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java for (Operator child : op.getChildOperators()) { + if (child instanceof SelectOperator || child instanceof ReduceSinkOperator) { + continue; + } + List neededCols = cppCtx.genColLists(op, child); + if (neededCols.size() < op.getSchema().getSignature().size()) { + ArrayList exprs = new ArrayList(); + ArrayList outputs = new ArrayList(); + Map colExprMap = new HashMap(); + ArrayList outputRS = new ArrayList(); + for (String internalName : neededCols) { + ColumnInfo colInfo = op.getSchema().getColumnInfo( Should preserve the order of cols as it appears in GB 2. Nit Pick: change name of OP to GBOP 3. Nit Pick: Change name of output to outputColNames > Aggregation / Projection over Multi-Join Inner Query producing incorrect results > -------------------------------------------------------------------------------- > > Key: HIVE-10996 > URL: https://issues.apache.org/jira/browse/HIVE-10996 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 > Reporter: Gautam Kowshik > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt > > > We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. > The following query (Q1) produces no results: > {code} > select s > from ( > select last.*, action.st2, action.n > from ( > select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp > from (select * from purchase_history) purchase > join (select * from cart_history) mevt > on purchase.s = mevt.s > where purchase.timestamp > mevt.timestamp > group by purchase.s, purchase.timestamp > ) last > join (select * from events) action > on last.s = action.s and last.last_stage_timestamp = action.timestamp > ) list; > {code} > While this one (Q2) does produce results : > {code} > select * > from ( > select last.*, action.st2, action.n > from ( > select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp > from (select * from purchase_history) purchase > join (select * from cart_history) mevt > on purchase.s = mevt.s > where purchase.timestamp > mevt.timestamp > group by purchase.s, purchase.timestamp > ) last > join (select * from events) action > on last.s = action.s and last.last_stage_timestamp = action.timestamp > ) list; > 1 21 20 Bob 1234 > 1 31 30 Bob 1234 > 3 51 50 Jeff 1234 > {code} > The setup to test this is: > {code} > create table purchase_history (s string, product string, price double, timestamp int); > insert into purchase_history values ('1', 'Belt', 20.00, 21); > insert into purchase_history values ('1', 'Socks', 3.50, 31); > insert into purchase_history values ('3', 'Belt', 20.00, 51); > insert into purchase_history values ('4', 'Shirt', 15.50, 59); > create table cart_history (s string, cart_id int, timestamp int); > insert into cart_history values ('1', 1, 10); > insert into cart_history values ('1', 2, 20); > insert into cart_history values ('1', 3, 30); > insert into cart_history values ('1', 4, 40); > insert into cart_history values ('3', 5, 50); > insert into cart_history values ('4', 6, 60); > create table events (s string, st2 string, n int, timestamp int); > insert into events values ('1', 'Bob', 1234, 20); > insert into events values ('1', 'Bob', 1234, 30); > insert into events values ('1', 'Bob', 1234, 25); > insert into events values ('2', 'Sam', 1234, 30); > insert into events values ('3', 'Jeff', 1234, 50); > insert into events values ('4', 'Ted', 1234, 60); > {code} > I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. > I will note that if I convert the inner join to a table and select from that the issue does not appear. > Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)