orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: ORC double encoding optimization proposal
Date Tue, 27 Mar 2018 20:32:24 GMT
Going back to the point of double split encoding, it would make sense to
try a variant where we combine the sign and the mantissa. That should
remove the sign stream at a relatively little cost of making the mantissa
stream signed.

Thinking more about the layout options...

Another consideration is that we'd be better off not splitting the
compression chunks between ranges and yet I'm worried about the overhead of
closing all of the compression chunks and rle runs early.

So we could modify my #2 proposal to be sensitive to rle and compression
chunks. If at the end of the row group, we wait until the rle and
compression chunks close and interleave the streams. That means that for a
column with three streams and two row groups, we could something like:

stream1.1, stream2.1, stream3.1, stream1.2, stream2.2, stream3.2

stream x.y contains a whole number of compression chunks and the majority
of the data for row group X is in the stream *.X. This significantly
improves the current state of affairs because now we know that if we read
stream *.1, we'll have the entire first row group and can start
decompression and processing while we read the other "stripelets".

By not forcing the closure of the rle and compression, we have preserved
the compression and yet gained the ability to have async io in the reader.

.. Owen


On Sun, Mar 25, 2018 at 11:47 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

>
> >    2. Under seek or predicate pushdown scenario, there’s no need to load
> the entire stream.
>
> Yes, that is a valid scenario where the reader reads partial-streams &
> causes random IO.
>
> The current double encoding is actually 2 streams today & will continue to
> use 2 streams for the FLIP implementation.
>
> The SPLIT implementation will go from the current 2 streams to 4 streams
> (i.e 1+1->1+3 streams) & the total data IO will drop by ~2x or so. More so
> if one of the streams can be suppressed (like in my IoT data-set, where the
> sign-bit is always +ve for my electric meter data).
>
> The trade-offs seem to be working out on regular HDDs with locality & for
> LLAP SSD caches - if your use-cases are different, I'd like to hear more
> about it.
>
> The only significant random IO delays expected seem to be entirely within
> the HDFS API network hops (which offers 0% locality when data is erasure
> coded or for cloud-storage), which I hope to fix in the Hadoop-3.x branch
> with a new API.
>
> Cheers,
> Gopal
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message