orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dain Sundstrom <d...@iq80.com>
Subject Re: ORC double encoding optimization proposal
Date Mon, 26 Mar 2018 18:47:47 GMT
Doubling the seeks would be a big deal for many of our installations.  I practice it means
that we would need to fully buffer the two streams (assuming they are placed after each other)
to avoid the extra seeks.  If this could be done in one stream, I think it would be a big
improvement.  Is it possible to place the data row oriented, or to use sub chunks in the stream
(e.g. N items from a then N items from b), so you can still stream?

-dain

> On Mar 26, 2018, at 2:24 AM, Xiening Dai <xndai.git@live.com> wrote:
> 
> Where does the 2x IO drop come from? Based on Cheng Xu’s data, Split + Zstd has ~15%
improvement over PlainV2 + Zstd in terms of the file size. If I understand correctly, the
total number of IO reads are almost the same, but Split will need an additional seek for each
read.
> 
> The random IOPS would eventually determines the throughput of HDD. IO queue can build
up quickly when there are too many seeks and then drastically affects read/write performance.
That’s the major concern, and it’s not related to locality. 
> 
> 
>> On Mar 26, 2018, at 2:47 PM, Gopal Vijayaraghavan <gopalv@apache.org> wrote:
>> 
>> 
>>>  2. Under seek or predicate pushdown scenario, there’s no need to load the
entire stream.
>> 
>> Yes, that is a valid scenario where the reader reads partial-streams & causes
random IO.
>> 
>> The current double encoding is actually 2 streams today & will continue to use
2 streams for the FLIP implementation.
>> 
>> The SPLIT implementation will go from the current 2 streams to 4 streams (i.e 1+1->1+3
streams) & the total data IO will drop by ~2x or so. More so if one of the streams can
be suppressed (like in my IoT data-set, where the sign-bit is always +ve for my electric meter
data).
>> 
>> The trade-offs seem to be working out on regular HDDs with locality & for LLAP
SSD caches - if your use-cases are different, I'd like to hear more about it.
>> 
>> The only significant random IO delays expected seem to be entirely within the HDFS
API network hops (which offers 0% locality when data is erasure coded or for cloud-storage),
which I hope to fix in the Hadoop-3.x branch with a new API.
>> 
>> Cheers,
>> Gopal
>> 
>> 
> 


Mime
View raw message