Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17778190DA for ; Mon, 25 Apr 2016 16:24:13 +0000 (UTC) Received: (qmail 37985 invoked by uid 500); 25 Apr 2016 16:24:13 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 37962 invoked by uid 500); 25 Apr 2016 16:24:12 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 37941 invoked by uid 99); 25 Apr 2016 16:24:12 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Apr 2016 16:24:12 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C9A922C044E for ; Mon, 25 Apr 2016 16:24:12 +0000 (UTC) Date: Mon, 25 Apr 2016 16:24:12 +0000 (UTC) From: "Owen O'Malley (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256542#comment-15256542 ] Owen O'Malley commented on HIVE-9660: ------------------------------------- I don't think we need to bump up the writer version for this change, because the reader can tell if the protobuf has the field or not. WriterVersions are typically reserved for bugs in the writer where the reader needs to work around bugs. Can you give a top level view on how you are approaching adding the lengths? > store end offset of compressed data for RG in RowIndex in ORC > ------------------------------------------------------------- > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, HIVE-9660.10.patch, HIVE-9660.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of compressed buffers for each RG, or end offset, or something, to remove this estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)