From issues-return-154595-archive-asf-public=cust-asf.ponee.io@hive.apache.org Fri Mar 29 17:21:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1F7F9180629 for ; Fri, 29 Mar 2019 18:21:01 +0100 (CET) Received: (qmail 2838 invoked by uid 500); 29 Mar 2019 17:21:01 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 2828 invoked by uid 99); 29 Mar 2019 17:21:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2019 17:21:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9FAC4E29B8 for ; Fri, 29 Mar 2019 17:21:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 343752459B for ; Fri, 29 Mar 2019 17:21:00 +0000 (UTC) Date: Fri, 29 Mar 2019 17:21:00 +0000 (UTC) From: "Vineet Garg (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg reassigned HIVE-21539: ---------------------------------- Assignee: Vineet Garg > GroupBy + where clause on same column results in incorrect query rewrite > ------------------------------------------------------------------------ > > Key: HIVE-21539 > URL: https://issues.apache.org/jira/browse/HIVE-21539 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 4.0.0 > Reporter: anishek > Assignee: Vineet Garg > Priority: Major > > {code} > create table a (i int, j string); > insert into a values ( 1, 'a'),(2,'b'); > explain extended select min(j) from a where j='a' group by j; > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0` | > | FROM `default`.`a` | > | WHERE `j` = 'a' | > | GROUP BY TRUE | > | STAGE DEPENDENCIES: | > | Stage-1 is a root stage | > | Stage-0 depends on stages: Stage-1 | > | | > | STAGE PLANS: | > | Stage: Stage-1 | > | Tez | > | DagId: anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 | > | Edges: | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | DagName: anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 | > | Vertices: | > | Map 1 | > | Map Operator Tree: | > | TableScan | > | alias: a | > | filterExpr: (j = 'a') (type: boolean) | > | Statistics: Num rows: 2 Data size: 170 Basic stats: COMPLETE Column stats: COMPLETE | > | GatherStats: false | > | Filter Operator | > | isSamplingPred: false | > | predicate: (j = 'a') (type: boolean) | > | Statistics: Num rows: 1 Data size: 85 Basic stats: COMPLETE Column stats: COMPLETE | > | Select Operator | > | Statistics: Num rows: 1 Data size: 85 Basic stats: COMPLETE Column stats: COMPLETE | > | Group By Operator | > | aggregations: min(true) | > | keys: true (type: boolean) | > | mode: hash | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | > | Reduce Output Operator | > | key expressions: _col0 (type: boolean) | > | null sort order: a | > | sort order: + | > | Map-reduce partition columns: _col0 (type: boolean) | > | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | > | tag: -1 | > | value expressions: _col1 (type: boolean) | > | auto parallelism: true | > | Path -> Alias: | > | hdfs://localhost:9000/tmp/hive/warehouse/a [a] | > | Path -> Partition: | > | hdfs://localhost:9000/tmp/hive/warehouse/a | > | Partition | > | base file name: a | > | input format: org.apache.hadoop.mapred.TextInputFormat | > | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | properties: | > | COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} | > | bucket_count -1 | > | bucketing_version 2 | > | column.name.delimiter , | > | columns i,j | > | columns.comments | > | columns.types int:string | > | file.inputformat org.apache.hadoop.mapred.TextInputFormat | > | file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | location hdfs://localhost:9000/tmp/hive/warehouse/a | > | name default.a | > | numFiles 3 | > | numRows 2 | > | rawDataSize 6 | > | serialization.ddl struct a { i32 i, string j} | > | serialization.format 1 | > | serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | totalSize 16 | > | transient_lastDdlTime 1552903148 | > | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | | > | input format: org.apache.hadoop.mapred.TextInputFormat | > | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | properties: | > | COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} | > | bucket_count -1 | > | bucketing_version 2 | > | column.name.delimiter , | > | columns i,j | > | columns.comments | > | columns.types int:string | > | file.inputformat org.apache.hadoop.mapred.TextInputFormat | > | file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | location hdfs://localhost:9000/tmp/hive/warehouse/a | > | name default.a | > | numFiles 3 | > | numRows 2 | > | rawDataSize 6 | > | serialization.ddl struct a { i32 i, string j} | > | serialization.format 1 | > | serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | totalSize 16 | > | transient_lastDdlTime 1552903148 | > | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | name: default.a | > | name: default.a | > | Truncated Path -> Alias: | > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | /a [a] | > | Reducer 2 | > | Needs Tagging: false | > | Reduce Operator Tree: | > | Group By Operator | > | aggregations: min(VALUE._col0) | > | keys: KEY._col0 (type: boolean) | > | mode: mergepartial | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | > | Select Operator | > | expressions: _col1 (type: boolean) | > | outputColumnNames: _col0 | > | Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE | > | File Output Operator | > | compressed: false | > | GlobalTableId: 0 | > | directory: hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002 | > | NumFilesPerFileSink: 1 | > | Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE | > | Stats Publishing Key Prefix: hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002/ | > | table: | > | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | > | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | > | properties: | > | columns _col0 | > | columns.types boolean | > | escape.delim \ | > | hive.serialization.extend.additional.nesting.levels true | > | serialization.escape.crlf true | > | serialization.format 1 | > | serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | TotalFiles: 1 | > | GatherStats: false | > | MultiFileSpray: false | > | | > | Stage: Stage-0 | > | Fetch Operator | > | limit: -1 | > | Processor Tree: | > | ListSink | > | | > +----------------------------------------------------+ > {code} > query is rewritten with *true* as the column value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)