Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A53B3874F for ; Thu, 18 Aug 2011 20:31:59 +0000 (UTC) Received: (qmail 13250 invoked by uid 500); 18 Aug 2011 20:31:59 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 13060 invoked by uid 500); 18 Aug 2011 20:31:58 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 13052 invoked by uid 500); 18 Aug 2011 20:31:58 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 13048 invoked by uid 99); 18 Aug 2011 20:31:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 20:31:58 +0000 X-ASF-Spam-Status: No, hits=-2001.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 20:31:56 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9F21CC3026 for ; Thu, 18 Aug 2011 20:31:36 +0000 (UTC) Date: Thu, 18 Aug 2011 20:31:36 +0000 (UTC) From: "jiraposter@reviews.apache.org (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <278593515.50411.1313699496648.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1171000502.43690.1313542647228.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HIVE-2382) Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087264#comment-13087264 ] jiraposter@reviews.apache.org commented on HIVE-2382: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1566/ ----------------------------------------------------------- (Updated 2011-08-18 20:29:54.207958) Review request for hive. Changes ------- Unit tests passed Summary ------- https://issues.apache.org/jira/browse/HIVE-2382 This addresses bug HIVE-2382. https://issues.apache.org/jira/browse/HIVE-2382 Diffs ----- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1157990 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/groupby_ppd.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/groupby_ppd.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1566/diff Testing (updated) ------- Unit tests passed Thanks, Charles > Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation > ------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-2382 > URL: https://issues.apache.org/jira/browse/HIVE-2382 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.6.0 > Reporter: Charles Chen > Assignee: Charles Chen > Priority: Critical > Fix For: 0.8.0 > > Attachments: HIVE-2382v1.patch > > > When a GROUP BY is specified, a select operator is added before the GROUP BY in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column expression map for this is set to the column expression map for the parent operator. This behavior is incorrect as, for example, the parent operator could rearrange the order of the columns (_col0 => _col0, _col1 => _col2, _col2 => _col1) and the new operator should not repeat this. > The predicate pushdown optimization uses the column expression map to track which columns a filter expression refers to at different operators. This results in a filter on incorrect columns. > Here is a simple case of this going wrong: Using > {noformat} > create table invites (id int, foo int, bar int); > {noformat} > executing the query > {noformat} > explain select * from (select foo, bar from (select bar, foo from invites c union all select bar, foo from invites d) b) a group by bar, foo having bar=1; > {noformat} > results in > {noformat} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > a-subquery1:b-subquery1:c > TableScan > alias: c > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > a-subquery2:b-subquery2:d > TableScan > alias: d > Filter Operator > predicate: > expr: (foo = 1) > type: boolean > Select Operator > expressions: > expr: bar > type: int > expr: foo > type: int > outputColumnNames: _col0, _col1 > Union > Select Operator > expressions: > expr: _col1 > type: int > expr: _col0 > type: int > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > Group By Operator > bucketGroup: false > keys: > expr: _col1 > type: int > expr: _col0 > type: int > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > Reduce Operator Tree: > Group By Operator > bucketGroup: false > keys: > expr: KEY._col0 > type: int > expr: KEY._col1 > type: int > mode: mergepartial > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {noformat} > Note that the filter is now "foo = 1", while the correct behavior is to have "bar = 1". If we remove the group by, the behavior is correct. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira