hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Szita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition
Date Thu, 19 Oct 2017 12:54:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210972#comment-16210972
] 

Adam Szita commented on HIVE-9447:
----------------------------------

I'm happy to take this and fix and rebase the patch unless any objections.

> Metastore: inefficient Oracle query for removing unused column descriptors when add/drop
table/partition
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9447
>                 URL: https://issues.apache.org/jira/browse/HIVE-9447
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>         Attachments: HIVE-9447.1.patch, HIVE-9447.2.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Metastore needs removing unused column descriptors when drop/add partitions or tables.
For query the unused column descriptor, the current implementation utilizes datanuleus' range
function, which basically equals LIMIT syntax. However, Oracle does not support LIMIT, the
query is converted as  
> {quote}
> SQL> SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
> 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
> NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
> A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
> WHERE A0.CD_ID = ? ) subq ) WHERE  rn <= 1;
> {quote}
> Given that CD_ID is not very selective, this query may have to access large amount of
rows (depends how many partitions the table has, millions of rows in our case). Metastore
may become unresponsive because of this. 
> Since Metastore only needs to know if the specific CD_ID is referenced in SDS table and
does not need access the whole row. We can use 
> {quote}
> select count(1) from SDS where SDS.CD_ID=?
> {quote}
> CD_ID is index column, the above query will do range scan for index, which is faster.

> For other DBs support LIMIT syntax such as MySQL, this problem does not exist. However,
the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message