hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore
Date Fri, 14 Aug 2015 20:09:46 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697664#comment-14697664
] 

Sergey Shelukhin edited comment on HIVE-11500 at 8/14/15 8:09 PM:
------------------------------------------------------------------

Actually the main reason all these calls exist for partitions is because they use args instead
of request-response pattern, which makes it impossible to change the signature in a backward-compatible
manner. I will happily refactor the newly added calls to be generic (req/resp should allow
for that), or deprecate them in favor of generic calls and remove later, if the need arises.



was (Author: sershe):
Actually the main reason all these calls exist for partitions is because they use args instead
of request-response pattern, which makes it impossible to change the signature in a backward-compatible
manner. I will happily refactor these calls to be generic, or deprecate them in favor of generic
calls and remove later, if the need arises. 

> implement file footer / splits cache in HBase metastore
> -------------------------------------------------------
>
>                 Key: HIVE-11500
>                 URL: https://issues.apache.org/jira/browse/HIVE-11500
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Metastore
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBase metastore split cache.pdf
>
>
> We need to cache file metadata (e.g. ORC file footers) for split generation (which, on
FSes that support fileId, will be valid permanently and only needs to be removed lazily when
ORC file is erased or compacted), and potentially even some information about splits (e.g.
grouping based on location that would be good for some short time), in HBase metastore.
> -It should be queryable by table. Partition predicate pushdown should be supported. If
bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check
FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc.
to split generation compared to paths, we will probably just filter by paths and fileIds.
It might be different for splits
> In later phases, it would be nice to save the (first category above) results of expensive
work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises
when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's
a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message