hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7195) Improve Metastore performance
Date Fri, 13 Jun 2014 01:23:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030131#comment-14030131
] 

Mithun Radhakrishnan commented on HIVE-7195:
--------------------------------------------

[~sershe]: I'm sorry, I've not found the time to port my patch to 13 and raise a JIRA. My
work was primarily in the PartitionPruner code. It was to ensure that {{listPartitions(db,
table, -1)}} isn't called (during plan optimization), if the call is a metadata-only query.
I can post the 12-patch in a JIRA, whatever that's worth.

Incidentally, I've raised HIVE-7223 to discuss the idea of using {{PartitionSpecs}}. [~alangates]
suggested that we explore if a PartitionSpec abstract could also represent lighter Partition-groups
that share commonality (StorageDescs, etc.). Still thinking that through. (If only Thrift
supported polymorphism. :])

> Improve Metastore performance
> -----------------------------
>
>                 Key: HIVE-7195
>                 URL: https://issues.apache.org/jira/browse/HIVE-7195
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Critical
>
> Even with direct SQL, which significantly improves MS performance, some operations take
a considerable amount of time, when there are many partitions on table. Specifically I believe
the issue:
> * When a client gets all partitions we do not send them an iterator, we create a collection
of all data and then pass the object over the network in total
> * Operations which require looking up data on the NN can still be slow since there is
no cache of information and it's done in a serial fashion
> * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and
the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout
to the server so it can calculate that the client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message