hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <>
Subject [jira] [Commented] (HIVE-6492) limit partition number involved in a table scan
Date Fri, 28 Feb 2014 20:50:23 GMT


Gunther Hagleitner commented on HIVE-6492:

Thanks, Selina. Just trying to understand the requirements to see what's the best way to get
this in.

One question is whether you can deploy different configs in these scenarios. E.g: use a different
site file is someone is starting hive on the console v tools. Or use an alias to add a --hiveconf
on the node where users start hive. You're trying to protect the cluster from large jobs -
in your case you seem to want to turn this on for certain interfaces and off for others, but
for other deployments that might not make much sense (the interface (ODBC/JDBC/CLI) doesn't
say if it's a human, tool, etc).

But specifically:

1) What's "small"? Sounds like if it's a query doesn't submit a job you want to let it go
through? Or only if there's an explicit limit clause?
2) That's the same as 1 - if you just check for "no job started"
3) Aggregation on partition key right now will scan the entire table in a massive map-red
job. Definitely something that should be fixed - but there's no optimization for that yet
afaik. Allowing this query seems to defeat the purpose of the this flag doesn't it? Seems
like again you just want to check for "no job started".

With that - it would make sense to update/extend the hive.mapred.mode variable to allow for
queries that don't actually start a job (and allow jobs only with explicit partition pruning).
That change + different config for different interfaces you should get all that you want and
would be simpler. Correct?

> limit partition number involved in a table scan
> -----------------------------------------------
>                 Key: HIVE-6492
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Selina Zhang
>             Fix For: 0.13.0
>         Attachments: HIVE-6492.1.patch.txt
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> To protect the cluster, a new configure variable "hive.limit.query.max.table.partition"
is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.

This message was sent by Atlassian JIRA

View raw message