hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-7195) Improve Metastore performance
Date Wed, 11 Jun 2014 23:11:10 GMT


Sergey Shelukhin commented on HIVE-7195:

And yeah the 3rd thing is iterators. We don't really need to keep things on server for that,
client can send all the necessary stuff to restore the iterator. We can make it fully stateless
by e.g. issuing the same queries with some added limit to get next "page", or cache records
in metastore (might cause problems with memory). Also presumably iterator will have to operate
within externally called openTransaction, otherwise the set may not be consistent.

> Improve Metastore performance
> -----------------------------
>                 Key: HIVE-7195
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Critical
> Even with direct SQL, which significantly improves MS performance, some operations take
a considerable amount of time, when there are many partitions on table. Specifically I believe
the issue:
> * When a client gets all partitions we do not send them an iterator, we create a collection
of all data and then pass the object over the network in total
> * Operations which require looking up data on the NN can still be slow since there is
no cache of information and it's done in a serial fashion
> * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and
the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout
to the server so it can calculate that the client has expired.

This message was sent by Atlassian JIRA

View raw message