Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6443A824D for ; Tue, 9 Aug 2011 00:54:52 +0000 (UTC) Received: (qmail 93905 invoked by uid 500); 9 Aug 2011 00:54:51 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 93596 invoked by uid 500); 9 Aug 2011 00:54:51 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 93575 invoked by uid 500); 9 Aug 2011 00:54:50 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 93572 invoked by uid 99); 9 Aug 2011 00:54:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 00:54:50 +0000 X-ASF-Spam-Status: No, hits=-2000.8 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2011 00:54:49 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id DB90CB2133 for ; Tue, 9 Aug 2011 00:54:28 +0000 (UTC) Date: Tue, 9 Aug 2011 00:54:28 +0000 (UTC) From: "Charles Chen (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <2075599815.18513.1312851268896.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-1342: ------------------------------- Attachment: HIVE-1342v1.patch > Predicate push down get error result when sub-queries have the same alias name > ------------------------------------------------------------------------------- > > Key: HIVE-1342 > URL: https://issues.apache.org/jira/browse/HIVE-1342 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.6.0 > Reporter: Ted Xu > Assignee: Ted Xu > Priority: Critical > Attachments: HIVE-1342v1.patch, cmd.hql, explain, ppd_same_alias_1.patch, ppd_same_alias_2.patch > > > Query is over-optimized by PPD when sub-queries have the same alias name, see the query: > ------------------------------- > create table if not exists dm_fact_buyer_prd_info_d ( > category_id string > ,gmv_trade_num int > ,user_id int > ) > PARTITIONED BY (ds int); > set hive.optimize.ppd=true; > set hive.map.aggr=true; > explain select category_id1,category_id2,assoc_idx > from ( > select > category_id1 > , category_id2 > , count(distinct user_id) as assoc_idx > from ( > select > t1.category_id as category_id1 > , t2.category_id as category_id2 > , t1.user_id > from ( > select category_id, user_id > from dm_fact_buyer_prd_info_d > group by category_id, user_id ) t1 > join ( > select category_id, user_id > from dm_fact_buyer_prd_info_d > group by category_id, user_id ) t2 on t1.user_id=t2.user_id > ) t1 > group by category_id1, category_id2 ) t_o > where category_id1 <> category_id2 > and assoc_idx > 2; > ----------------------------- > The query above will fail when execute, throwing exception: "can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". > I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate): > ------------------------------- > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > t_o:t1:t1:dm_fact_buyer_prd_info_d > TableScan > alias: dm_fact_buyer_prd_info_d > Filter Operator > predicate: > expr: *(category_id <> user_id)* > type: boolean > Select Operator > expressions: > expr: category_id > type: string > expr: user_id > type: bigint > outputColumnNames: category_id, user_id > Group By Operator > keys: > expr: category_id > type: string > expr: user_id > type: bigint > mode: hash > outputColumnNames: _col0, _col1 > Reduce Output Operator > key expressions: > expr: _col0 > type: string > expr: _col1 > type: bigint > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: string > expr: _col1 > type: bigint > tag: -1 > Reduce Operator Tree: > Group By Operator > keys: > expr: KEY._col0 > type: string > expr: KEY._col1 > type: bigint > mode: mergepartial > outputColumnNames: _col0, _col1 > Select Operator > expressions: > expr: _col0 > type: string > expr: _col1 > type: bigint > outputColumnNames: _col0, _col1 > File Output Operator > compressed: true > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > ---------------------------------- > If disabling predicate push down (set hive.optimize.ppd=true), the error is gone; I tried disabling map side aggregate, the error is gone,too. > *Changing the alias of subquery 't1' (either the inner one or the join result), the bug disappears, too.* -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira