hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-9270) LLAP: improve high-level cache from prototype
Date Tue, 06 Jan 2015 21:22:35 GMT


Sergey Shelukhin commented on HIVE-9270:

This task is postponed for now, since cache is by far not the lowest hanging fruit here.

> LLAP: improve high-level cache from prototype
> ---------------------------------------------
>                 Key: HIVE-9270
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
> Cache in the prototype has number of limitations.
> 1) Having 16-32-..Mb chunks with many logical units of caching can result in undesirable
priority phenomena. Priority tracking is needed for every such unit, with some form of priority-splitting
> I have a design for that that never blocks readers...
> 2) Something like buddy allocator can also be used instead of fixed size blocks.
> 3) Needs tighter integration with file formats since we abandoned intermediate format
and are planning to make unit of caching much smaller (RG, not stripe) - e.g. ORC can decompress
data directly into a large buffer, then pass on logical boundaries to ChunkPool.
> 4) For the same reason of having so many cached objects one might consider actually making
it format-specific and/or hierarchical, since requestion 1000s of objects may be suboptimal
(e.g. TPCDS stripe has ~430 RGs, with just a few columns that's a lot of objects to request
- much easier if RGs are all sequential and can be returned together if sargs didn't do a
lot of filtering).
> 5) Minor like not reusing allocated buffers after they are evicted and instead allocating
again, etc.

This message was sent by Atlassian JIRA

View raw message