orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Thoughts on Acid reader
Date Thu, 14 Sep 2017 23:52:16 GMT
>  For performance reasons, you prefer the second option that I rejected
>  where users give a file and the system finds the deletes from there.  I can
>  buy that.

That's simpler at least to understand and debug, the logs from ORC alone are enough to find
consistency issues.

The rest of the details are implicit to the implementation, beyond a base file and the current
transaction state.

This is nearly exactly how the LLAP ACID cache patch does today, which looks the cache up
on the base file and applies local transaction state per query (i.e valid txns list which
hides the committed deletes from an older query).

>  I don’t follow your last comment about ROW__ID being projected out to the
> user.  ORC isn’t currently hiding that field from the reader is it?

In general, a BI tool of some kind over ACID probably cares about the data and not the metadata
about which rows belong to which transaction in general.

Hiding ROW__ID makes the consumer side of the reader identical between ACID and non-ACID,
unless it is being read by a "SELECT FOR UPDATE" reader.

Cheers,
Gopal



Mime
View raw message