orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiening Dai <xndai....@live.com>
Subject Re: ORC double encoding optimization proposal
Date Wed, 28 Mar 2018 08:01:10 GMT
So we could modify my #2 proposal to be sensitive to rle and compression chunks. If at the
end of the row group, we wait until the rle and compression chunks close and interleave the
streams. That means that for a column with three streams and two row groups, we could something
like:


I think you mean #1 proposal, right? This modification will increase the complexity of implementation,
and I am not sure how much we will gain by not closing compression and rle chunks. You probably
have some data when you firstly designed row group and index.

Regarding double encoding, we actually have the 3rd option, which is to use what we already
have today - PlainV2. According to Xu Cheng’s test, plainV2 is on par with Split in terms
of the size when zStd is used as compressor. Flip is fast, but size has been a concern. At
this point, I don’t see a clear winner.


On Mar 28, 2018, at 4:32 AM, Owen O'Malley <owen.omalley@gmail.com<mailto:owen.omalley@gmail.com>>
wrote:

Going back to the point of double split encoding, it would make sense to try a variant where
we combine the sign and the mantissa. That should remove the sign stream at a relatively little
cost of making the mantissa stream signed.

Thinking more about the layout options...

Another consideration is that we'd be better off not splitting the compression chunks between
ranges and yet I'm worried about the overhead of closing all of the compression chunks and
rle runs early.

So we could modify my #2 proposal to be sensitive to rle and compression chunks. If at the
end of the row group, we wait until the rle and compression chunks close and interleave the
streams. That means that for a column with three streams and two row groups, we could something
like:

stream1.1, stream2.1, stream3.1, stream1.2, stream2.2, stream3.2

stream x.y contains a whole number of compression chunks and the majority of the data for
row group X is in the stream *.X. This significantly improves the current state of affairs
because now we know that if we read stream *.1, we'll have the entire first row group and
can start decompression and processing while we read the other "stripelets".

By not forcing the closure of the rle and compression, we have preserved the compression and
yet gained the ability to have async io in the reader.

.. Owen


On Sun, Mar 25, 2018 at 11:47 PM, Gopal Vijayaraghavan <gopalv@apache.org<mailto:gopalv@apache.org>>
wrote:

>    2. Under seek or predicate pushdown scenario, there’s no need to load the entire
stream.

Yes, that is a valid scenario where the reader reads partial-streams & causes random IO.

The current double encoding is actually 2 streams today & will continue to use 2 streams
for the FLIP implementation.

The SPLIT implementation will go from the current 2 streams to 4 streams (i.e 1+1->1+3
streams) & the total data IO will drop by ~2x or so. More so if one of the streams can
be suppressed (like in my IoT data-set, where the sign-bit is always +ve for my electric meter
data).

The trade-offs seem to be working out on regular HDDs with locality & for LLAP SSD caches
- if your use-cases are different, I'd like to hear more about it.

The only significant random IO delays expected seem to be entirely within the HDFS API network
hops (which offers 0% locality when data is erasure coded or for cloud-storage), which I hope
to fix in the Hadoop-3.x branch with a new API.

Cheers,
Gopal




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message