hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13345) LLAP: metadata cache takes too much space, esp. with bloom filters, due to Java/protobuf overhead
Date Mon, 28 Mar 2016 20:24:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214816#comment-15214816
] 

Prasanth Jayachandran commented on HIVE-13345:
----------------------------------------------

IMO we should store the serialized representation of metadata. Deserialized representation
of metadata (Proto objects) are supposed to be short-lived. We have POJOs for all protobuf
equivalents. BloomFilter, ColumnStatistics, StripeInformation etc. which creates POJOs from
Proto objects. If we are caching the deserialized representation then we should cache the
equivalent POJOs and not the proto objects.

> LLAP: metadata cache takes too much space, esp. with bloom filters, due to Java/protobuf
overhead
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13345
>                 URL: https://issues.apache.org/jira/browse/HIVE-13345
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> We cache java objects currently; these have high overhead, average stripe metadata takes
200-500Kb on real files, and with bloom filters blowing up more than x5 due to being stored
as list of Long-s, up to 5Mb per stripe. That is undesirable.
> We should either create better objects for ORC (might be good in general) or store serialized
metadata and deserialize when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message