hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15398) change metadata-only queries to still read the original table (in some cases?)
Date Fri, 09 Dec 2016 00:14:59 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-15398:
------------------------------------
    Description: 
See HIVE-15397.
There are multiple complementary ways to handle this properly:
1) Enhance MetadataOnly to recognize when table emptiness matters and only optimize safe query
patterns (or only use the below in unsafe cases). 
2) Create the original IF inside compilation, get record reader and see if it's empty. Seems
like the only bulletproof method in terms of correctness, but it may break due to difference
in setup and access between tasks and compilation. May also have security implications e.g.
if compilation is in HS2 and permissions are different from tasks.
3) Somehow inject limit into table scan (using limit in the plan, or just hack it into TS
itself specifically for this feature), and keep the original InputFormat. That way instead
of 0 or 1 null rows it would return 0 or 1 rows from the original split, while avoiding large
scans, which is the goal.


> change metadata-only queries to still read the original table (in some cases?)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-15398
>                 URL: https://issues.apache.org/jira/browse/HIVE-15398
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> See HIVE-15397.
> There are multiple complementary ways to handle this properly:
> 1) Enhance MetadataOnly to recognize when table emptiness matters and only optimize safe
query patterns (or only use the below in unsafe cases). 
> 2) Create the original IF inside compilation, get record reader and see if it's empty.
Seems like the only bulletproof method in terms of correctness, but it may break due to difference
in setup and access between tasks and compilation. May also have security implications e.g.
if compilation is in HS2 and permissions are different from tasks.
> 3) Somehow inject limit into table scan (using limit in the plan, or just hack it into
TS itself specifically for this feature), and keep the original InputFormat. That way instead
of 0 or 1 null rows it would return 0 or 1 rows from the original split, while avoiding large
scans, which is the goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message