hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <>
Subject [jira] [Commented] (HIVE-12727) allow full table queries in strict mode
Date Thu, 21 Jan 2016 06:32:40 GMT


Thejas M Nair commented on HIVE-12727:

bq. we want Hive to be geared for production use cases and not for pocs/benchmarks/ease-of-use
I think we really need to give lot of importance to ease of use. In this case, path of least
astonishment subcategory of ease of use :)

The strict mode is trying to enforce two categories of checks - 
 1. Prevent use of questionable semantics
 2. Prevent use of queries that can potentially take too much of cluster resources, which
have some symptoms of poorly written queries. 

I think current config is doing a poor job of doing the 2nd check. It is relying on heuristics
that rely on the operations in the query. Ideally, these checks should consider the actual
cost of the query. For example, if the result of a query is small, or if hive.optimize.sampling.orderby=true,
it should be perfectly OK to have a order-by without limit. In case of tables that have a
small number of partitions, it should be OK to have no partition clause in the query. 
I think the 2nd category of checks are not mature enough to be enabled by default, while the
first category is.

Also, note that the 2nd category of checks are also likely to break general BI tools as they
won't be aware of these idiosyncrasies of hive.

I propose that we split the checks clearly into above two categories and enable only the first
kind by default.

How to separate the checks into two ? I think the category of semistrict and strict is confusing.
 It would be clearer to give names for the categories of checks.
Maybe support comma separated list of categories of checks to be enforced ?

How about calling the parameter hive.strict.checks and supporting list values of "semantic"
and "largequerypattern" (or similar) ? 

The equivalent of current strict mode becomes - 

The new default used becomes -

Thoughts ?

> allow full table queries in strict mode
> ---------------------------------------
>                 Key: HIVE-12727
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Blocker
>         Attachments: HIVE-12727.01.patch, HIVE-12727.patch
> Making strict mode the default recently appears to have broken many normal queries, such
as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias
"web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default back to non-strict.
Perhaps make a 3-value parameter, nonstrict, semistrict, and strict, for backward compat for
people who are relying on strict already.

This message was sent by Atlassian JIRA

View raw message