From dev-return-1970-archive-asf-public=cust-asf.ponee.io@orc.apache.org  Mon Mar 26 06:59:19 2018
Return-Path: <dev-return-1970-archive-asf-public=cust-asf.ponee.io@orc.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 759E2180671
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 26 Mar 2018 06:59:18 +0200 (CEST)
Received: (qmail 16989 invoked by uid 500); 26 Mar 2018 04:59:16 -0000
Mailing-List: contact dev-help@orc.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@orc.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@orc.apache.org>
List-Post: <mailto:dev@orc.apache.org>
List-Id: <dev.orc.apache.org>
Reply-To: dev@orc.apache.org
Delivered-To: mailing list dev@orc.apache.org
Received: (qmail 16966 invoked by uid 99); 26 Mar 2018 04:59:16 -0000
Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2018 04:59:16 +0000
Received: from [10.42.80.96] (outbound.hortonworks.com [192.175.27.2])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 33D8A1D7;
	Mon, 26 Mar 2018 04:59:14 +0000 (UTC)
User-Agent: Microsoft-MacOutlook/10.b.0.180311
Date: Sun, 25 Mar 2018 21:59:06 -0700
Subject: Re: ORC double encoding optimization proposal
From: Gopal Vijayaraghavan <gopalv@apache.org>
To: Xiening Dai <xndai.git@live.com>,
	"dev@orc.apache.org" <dev@orc.apache.org>,
	"user@orc.apache.org" <user@orc.apache.org>
Message-ID: <A434E2C3-5386-4A4A-A17B-F0EE979047E5@hortonworks.com>
Thread-Topic: ORC double encoding optimization proposal
References: <17B91B6B0D9BBC44A1682DABC201C53552055763@SHSMSX104.ccr.corp.intel.com>
 <D220CF55-A229-4A61-AFD3-A799E3997E90@hortonworks.com>
 <CY1PR05MB24282204DAEF8BEFBFCB1C068DAD0@CY1PR05MB2428.namprd05.prod.outlook.com>
In-Reply-To: <CY1PR05MB24282204DAEF8BEFBFCB1C068DAD0@CY1PR05MB2428.namprd05.prod.outlook.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="UTF-8"
Content-transfer-encoding: 7bit

Hi,


> Since Split creates two separated streams, reading one data batch will need an additional seek in order to reconstruct the column data

If you are seeing a seek like that, we've messed up something else higher up in the pipeline & that can be fixed.

ORC columnar reads only do random IO at the column level, not the stream level (except for non-column streams like the bloom filters) - adjacent streams are read together as a single IO op.

DiskRangeList produce a merged read plan before firing off any read, so the actual IO layer will (or should) never a seek between adjacent streams.

There's a possibility that someone will add an extra byte or something to a stream which they do not read ever, which might be a problem.

In early 2016 Rajesh & I went through each read IOP and tuned ORC for S3, which performs very poorly if you add irrelevant seeks.

If you do find a similar case in Apache ORC (not Hive-orc), I'll file a corresponding ticket to this

https://issues.apache.org/jira/browse/HIVE-13161

That was actually about reading 2 columns with an entirely NULL column in the middle, not exactly about splitting streams.

The next giant leap of IO performance for seeks is expected from a new HDFS API, which allows for the scatter-gather to be pushed-down further into the IO layer.

https://issues.apache.org/jira/browse/HADOOP-11867

This mainly intended for reading ORC files from Erasure coded streams, where the IO layer can reorganize and align the reads along the Erasure Coding boundaries (not so much about actual IOPs), instead of assuming normal read-ahead for the block reader.

Cheers,
Gopal