Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 121649FD7 for ; Tue, 7 Aug 2012 00:29:06 +0000 (UTC) Received: (qmail 5974 invoked by uid 500); 7 Aug 2012 00:29:03 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 5907 invoked by uid 500); 7 Aug 2012 00:29:03 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 5858 invoked by uid 500); 7 Aug 2012 00:29:03 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 5844 invoked by uid 99); 7 Aug 2012 00:29:03 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2012 00:29:03 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 624BE142856 for ; Tue, 7 Aug 2012 00:29:03 +0000 (UTC) Date: Tue, 7 Aug 2012 00:29:03 +0000 (UTC) From: "Navis (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <1654874227.19411.1344299343404.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1173: ------------------------ Affects Version/s: 0.10.0 Status: Open (was: Patch Available) > Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-1173 > URL: https://issues.apache.org/jira/browse/HIVE-1173 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.4.1, 0.4.0, 0.10.0 > Reporter: Vladimir Klimontovich > Assignee: Navis > > Brief description: > case 1) non-deterministic present in partition condition, joins are present in query => partition pruner doesn't do filtering of partitions based on condition > case 2) non-deterministic present in partition condition, joins aren't present in query => partition pruner do filtering of partitions based on condition > It's quite illogical when pruning depends on presence of joins in query. > Example: > Let's consider following sequence of hive queries: > 1) Create non-deterministic function: > create temporary function UDF2 as 'UDF2'; > {{ > import org.apache.hadoop.hive.ql.exec.UDF; > import org.apache.hadoop.hive.ql.udf.UDFType; > @UDFType(deterministic=false) > public class UDF2 extends UDF { > public String evaluate(String val) { > return val; > } > } > }} > 2) Create tables > CREATE TABLE Main ( > a STRING, > b INT > ) > PARTITIONED BY(part STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > LINES TERMINATED BY '10' > STORED AS TEXTFILE; > ALTER TABLE Main ADD PARTITION (part="part1") LOCATION "/hive-join-test/part1/"; > ALTER TABLE Main ADD PARTITION (part="part2") LOCATION "/hive-join-test/part2/"; > CREATE TABLE Joined ( > a STRING, > f STRING > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > LINES TERMINATED BY '10' > STORED AS TEXTFILE > LOCATION '/hive-join-test/join/'; > 3) Run first query: > select > m.a, > m.b > from Main m > where > part > UDF2('part0') AND part = 'part1'; > The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 > 4) Run second query (with join): > select > m.a, > j.a, > m.b > from Main m > join Joined j on > j.a=m.a > where > part > UDF2('part0') AND part = 'part1'; > Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join > 5) Also lets try to run query with MAPJOIN hint > select /*+MAPJOIN(j)*/ > m.a, > j.a, > m.b > from Main m > join Joined j on > j.a=m.a > where > part > UDF2('part0') AND part = 'part1'; > The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira