hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <>
Subject [jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata
Date Wed, 06 May 2015 21:15:00 GMT


Sushanth Sowmyan commented on HIVE-9451:

After discussion with Owen, marking as tentative for 1.2 - i.e. this will not hold up the
RC process for 1.2.0, but if it makes it before we release, it'll be part of 1.2.0.

This will still be honoured for inclusion in a 1.2.1 when we do it.

> Add max size of column dictionaries to ORC metadata
> ---------------------------------------------------
>                 Key: HIVE-9451
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>              Labels: ORC
>             Fix For: 1.2.0
>         Attachments: HIVE-9451.patch, HIVE-9451.patch
> To predict the amount of memory required to read an ORC file we need to know the size
of the dictionaries for the columns that we are reading. I propose adding the number of bytes
for each column's dictionary to the stripe's column statistics. The file's column statistics
would have the maximum dictionary size for each column.

This message was sent by Atlassian JIRA

View raw message