Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 52248200D42 for ; Thu, 2 Nov 2017 23:27:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 508F91609EB; Thu, 2 Nov 2017 22:27:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9745B160BE5 for ; Thu, 2 Nov 2017 23:27:03 +0100 (CET) Received: (qmail 11291 invoked by uid 500); 2 Nov 2017 22:27:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 11282 invoked by uid 99); 2 Nov 2017 22:27:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2017 22:27:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EA45A1A5030 for ; Thu, 2 Nov 2017 22:27:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id UTPwFgX13ath for ; Thu, 2 Nov 2017 22:27:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0A6175FB32 for ; Thu, 2 Nov 2017 22:27:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 701ABE0DF0 for ; Thu, 2 Nov 2017 22:27:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 24F582414F for ; Thu, 2 Nov 2017 22:27:00 +0000 (UTC) Date: Thu, 2 Nov 2017 22:27:00 +0000 (UTC) From: "Sergey Shelukhin (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Nov 2017 22:27:04 -0000 [ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236704#comment-16236704 ] Sergey Shelukhin commented on HIVE-17458: ----------------------------------------- Left some comments. My main 2 qs are 1) A patch mentions that non-split-update ACID cannot be read in Hive3. Wouldn't that mean all the legacy ACID data cannot be read? Reader compat should still be possible. 2) If there are originals only with no deltas, does it still activate the row id machinery? Looks like it should be unnecessary. > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --------------------------------------------------------------- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement > Affects Versions: 2.2.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This will likely look like a perf regression when converting a table from non-acid to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops will not vectorize until major compaction. > There is no reason why this should be the case. Just like OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other files in the logical tranche/bucket and calculate the offset for the RowBatch of the split. (Presumably getRecordReader().getRowNumber() works the same in vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer it from file path... which in particular simplifies OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)