hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Resolved] (HIVE-9269) LLAP: introduce low-level cache for ORC
Date Sat, 17 Jan 2015 02:07:34 GMT


Sergey Shelukhin resolved HIVE-9269.
    Resolution: Fixed

all parts and tests committed to branch

> LLAP: introduce low-level cache for ORC
> ---------------------------------------
>                 Key: HIVE-9269
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>             Fix For: llap
> There are two distinct options for caching encoded data in row-columnar format - caching
logical chunks (e.g. for ORC stripe x column, or rg x column), or caching physical chunks
(e.g. for ORC, compression buffers, entire stripes, ...). For highly selective queries, the
former will probably result in better cache utilization and less undesirable priority phenomena.
It will also be easier to use for different formats.
> However, given that logical chunks are variable-sized, it's harder to implement. Prototype
has a form of cache like that, but it has some serious shortcomings in its current form. Additionally,
high-level cache will operate above ACID logic in file format and would thus require cache
invalidation, which is as we know one of the only hard things in CS.
> Low level cache for ORC case, however, is easier to implement due to nearly fixed uncompressed
size of compression buffers; these, at 256k default, are also sufficiently granular. While
not having the benefit of having ACID delta-s already merged like a high-level cache would
have, it will work with ACID out of the box. 
> This JIRA is to implement low level cache.

This message was sent by Atlassian JIRA

View raw message