hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
Date Mon, 02 May 2016 17:37:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267043#comment-15267043
] 

Sergey Shelukhin edited comment on HIVE-9660 at 5/2/16 5:36 PM:
----------------------------------------------------------------

This is exactly what this patch does, except the coordination will move into each of the RL
writers instead of the central place.
So I don't really understand the difference in approach.

Note that the run length blocks finish before CBs (ie RL first, then CB containing the RL),
so the callbacks are actually reversed.

For uncompressed, the main concern is that for exact boundaries, there will be too many calls.


was (Author: sershe):
This is exactly what this patch does, except the coordination will move into each of the RL
writers instead of the central place.
So I don't really understand the difference in approach.

> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
>                 Key: HIVE-9660
>                 URL: https://issues.apache.org/jira/browse/HIVE-9660
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, HIVE-9660.03.patch, HIVE-9660.04.patch,
HIVE-9660.05.patch, HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.08.patch,
HIVE-9660.09.patch, HIVE-9660.10.patch, HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch,
HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of extra data
being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of compressed
buffers for each RG, or end offset, or something, to remove this estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message