Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Tue, 4 Mar 2014 00:17:27 +0000 (UTC)
From: "Gunther Hagleitner (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12696991.1393271739788.6998.1393892247277@arcas>
In-Reply-To: <JIRA.12696991.1393271739788@arcas>
References: <JIRA.12696991.1393271739788@arcas>
Subject: [jira] [Commented] (HIVE-6492) limit partition number involved in a
 table scan
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918793#comment-13918793 ] 

Gunther Hagleitner commented on HIVE-6492:
------------------------------------------

[~selinazh] can you open a reviewboard request for this. I have a few more comments:

- Can you add a test for stats optimizer? I think since you're checking for explicit limit on fetch operator that would still bail (i.e.: select count(*) from foo with stats available and hive.compute.query.using.stats = true)
- Your patch only works in MR (since you're computing access at the physical level)
- We already have the pruned list of partitions available at the logical level

If you move your code to right after we call Optimizer.optimize in the SemanticAnalyzer you can make both cases work.

Logic should be:
- If there is a fetch operator at this level let it pass (no mapreduce job will be launched)
- Otherwise go through parse context's top ops and use opToPartPruner to find out how many partitions are going to be accessed.

Does that make sense?

> limit partition number involved in a table scan
> -----------------------------------------------
>
>                 Key: HIVE-6492
>                 URL: https://issues.apache.org/jira/browse/HIVE-6492
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Selina Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.


--
This message was sent by Atlassian JIRA
(v6.2#6252)