hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Created] (HIVE-9270) LLAP: improve high-level cache from prototype
Date Tue, 06 Jan 2015 21:21:34 GMT
Sergey Shelukhin created HIVE-9270:

             Summary: LLAP: improve high-level cache from prototype
                 Key: HIVE-9270
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin

Cache in the prototype has number of limitations.
1) Having 16-32-..Mb chunks with many logical units of caching can result in undesirable priority
phenomena. Priority tracking is needed for every such unit, with some form of priority-splitting
I have a design for that that never blocks readers...
2) Something like buddy allocator can also be used instead of fixed size blocks.
3) Needs tighter integration with file formats since we abandoned intermediate format and
are planning to make unit of caching much smaller (RG, not stripe) - e.g. ORC can decompress
data directly into a large buffer, then pass on logical boundaries to ChunkPool.
4) For the same reason of having so many cached objects one might consider actually making
it format-specific and/or hierarchical, since requestion 1000s of objects may be suboptimal
(e.g. TPCDS stripe has ~430 RGs, with just a few columns that's a lot of objects to request
- much easier if RGs are all sequential and can be returned together if sargs didn't do a
lot of filtering).
5) Minor like not reusing allocated buffers after they are evicted and instead allocating
again, etc.

This message was sent by Atlassian JIRA

View raw message