hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
Date Tue, 21 Jun 2016 18:38:57 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342416#comment-15342416
] 

Sergio Peña commented on HIVE-13884:
------------------------------------

Thanks [~sershe].

[~mohitsabharwal] [~brocknoland] I run a test with 10K partitions {{select * from table12
where dt < 10000}} with the variable enabled and disabled. There's not too much difference.
I got a difference of 1 second, and I tested it 5 times each time, even without the patch
applied. I think we are good to go for this.

I'll wait until HIVE-14055 is fixed as I would need to change this patch as well.

> Disallow queries fetching more than a configured number of partitions in PartitionPruner
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-13884
>                 URL: https://issues.apache.org/jira/browse/HIVE-13884
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Mohit Sabharwal
>            Assignee: Sergio Peña
>         Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, HIVE-13884.3.patch, HIVE-13884.4.patch,
HIVE-13884.5.patch, HIVE-13884.6.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions based on filter
expression. In either scenarios, if the number of partitions accessed is large there can be
significant memory pressure at the HMS server end.
> We already have a config {{hive.limit.query.max.table.partition}} that enforces limits
on number of partitions that may be scanned per operator. But this check happens after the
PartitionPruner has already fetched all partitions.
> We should add an option at PartitionPruner level to disallow queries that attempt to
access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition filter in
PartitionPruner, but this check accepts any query with a pruning condition, even if partitions
fetched are large. In multi-tenant environments, admins could use more control w.r.t. number
of partitions allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names (instead of partition
specs) and throw an exception if number of partitions exceeds the configured value. Otherwise,
fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended to take
partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message