Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4845109CC for ; Tue, 4 Mar 2014 00:17:39 +0000 (UTC) Received: (qmail 16735 invoked by uid 500); 4 Mar 2014 00:17:33 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 16670 invoked by uid 500); 4 Mar 2014 00:17:31 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 16616 invoked by uid 500); 4 Mar 2014 00:17:29 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 16545 invoked by uid 99); 4 Mar 2014 00:17:27 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2014 00:17:27 +0000 Date: Tue, 4 Mar 2014 00:17:27 +0000 (UTC) From: "Gunther Hagleitner (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6492) limit partition number involved in a table scan MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918793#comment-13918793 ] Gunther Hagleitner commented on HIVE-6492: ------------------------------------------ [~selinazh] can you open a reviewboard request for this. I have a few more comments: - Can you add a test for stats optimizer? I think since you're checking for explicit limit on fetch operator that would still bail (i.e.: select count(*) from foo with stats available and hive.compute.query.using.stats = true) - Your patch only works in MR (since you're computing access at the physical level) - We already have the pruned list of partitions available at the logical level If you move your code to right after we call Optimizer.optimize in the SemanticAnalyzer you can make both cases work. Logic should be: - If there is a fetch operator at this level let it pass (no mapreduce job will be launched) - Otherwise go through parse context's top ops and use opToPartPruner to find out how many partitions are going to be accessed. Does that make sense? > limit partition number involved in a table scan > ----------------------------------------------- > > Key: HIVE-6492 > URL: https://issues.apache.org/jira/browse/HIVE-6492 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Affects Versions: 0.12.0 > Reporter: Selina Zhang > Fix For: 0.13.0 > > Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt > > Original Estimate: 24h > Remaining Estimate: 24h > > To protect the cluster, a new configure variable "hive.limit.query.max.table.partition" is added to hive configuration to > limit the table partitions involved in a table scan. > The default value will be set to -1 which means there is no limit by default. > This variable will not affect "metadata only" query. -- This message was sent by Atlassian JIRA (v6.2#6252)