hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaibhav Gumashta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10503) Aggregate stats cache: follow up optimizations
Date Thu, 30 Apr 2015 01:23:06 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vaibhav Gumashta updated HIVE-10503:
------------------------------------
    Description: 
Some follow up work items:
1. Estimate cache nodes from memory size - currently the user needs to specify size based
on #nodes.
2. Make the AggregateStatsCache#add method asynchronous - adding to cache can happen in a
new thread.
3. Based on perf testing, explore an alternate data structure for the node list per cache
key.
4. Explore ideas to reduce locking granularity of the value list per cache key.
5. There is an O(n*n) loop while finding the match - that should go away.
6. Single call to DB to get aggregate for columns not in cache.
7. Organize metrics capturing in a better way.
8. Address concerns on TTL causing stale data in cache.

  was:
Some follow up work items:
1. Estimate cache nodes from memory size - currently the user needs to specify size based
on #nodes.
2. Make the AggregateStatsCache#add method asynchronous - adding to cache can happen in a
new thread.
3. Based on perf testing, explore an alternate data structure for the node list per cache
key.
4. Explore ideas to reduce locking granularity of the value list per cache key.
5. There is an O(n*n) loop while finding the match - that should go away.
6. Single call to DB to get aggregate for columns not in cache.
7. Organize metrics capturing in a better way.


> Aggregate stats cache: follow up optimizations
> ----------------------------------------------
>
>                 Key: HIVE-10503
>                 URL: https://issues.apache.org/jira/browse/HIVE-10503
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 1.2.0
>            Reporter: Vaibhav Gumashta
>            Assignee: Vaibhav Gumashta
>             Fix For: 1.3.0
>
>
> Some follow up work items:
> 1. Estimate cache nodes from memory size - currently the user needs to specify size based
on #nodes.
> 2. Make the AggregateStatsCache#add method asynchronous - adding to cache can happen
in a new thread.
> 3. Based on perf testing, explore an alternate data structure for the node list per cache
key.
> 4. Explore ideas to reduce locking granularity of the value list per cache key.
> 5. There is an O(n*n) loop while finding the match - that should go away.
> 6. Single call to DB to get aggregate for columns not in cache.
> 7. Organize metrics capturing in a better way.
> 8. Address concerns on TTL causing stale data in cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message