hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
Date Tue, 04 Mar 2014 00:17:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918793#comment-13918793
] 

Gunther Hagleitner commented on HIVE-6492:
------------------------------------------

[~selinazh] can you open a reviewboard request for this. I have a few more comments:

- Can you add a test for stats optimizer? I think since you're checking for explicit limit
on fetch operator that would still bail (i.e.: select count(*) from foo with stats available
and hive.compute.query.using.stats = true)
- Your patch only works in MR (since you're computing access at the physical level)
- We already have the pruned list of partitions available at the logical level

If you move your code to right after we call Optimizer.optimize in the SemanticAnalyzer you
can make both cases work.

Logic should be:
- If there is a fetch operator at this level let it pass (no mapreduce job will be launched)
- Otherwise go through parse context's top ops and use opToPartPruner to find out how many
partitions are going to be accessed.

Does that make sense?

> limit partition number involved in a table scan
> -----------------------------------------------
>
>                 Key: HIVE-6492
>                 URL: https://issues.apache.org/jira/browse/HIVE-6492
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Selina Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, HIVE-6492.3.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable "hive.limit.query.max.table.partition"
is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message