orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: Thoughts on Acid reader
Date Fri, 15 Sep 2017 16:38:12 GMT
Yeah, I'd suggest adding to:

OrcFile.ReaderOptions:
   exposeAcidRowId(boolean); -- so that the returned schema includes the
ACID row id

Reader.Options:
   setValidTransactions(TransactionList); -- apply transaction filtering

Then it will read a single file (or range using
Reader.Options.range(long,long)).

.. Owen


On Thu, Sep 14, 2017 at 4:52 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

> >  For performance reasons, you prefer the second option that I rejected
> >  where users give a file and the system finds the deletes from there.  I
> can
> >  buy that.
>
> That's simpler at least to understand and debug, the logs from ORC alone
> are enough to find consistency issues.
>
> The rest of the details are implicit to the implementation, beyond a base
> file and the current transaction state.
>
> This is nearly exactly how the LLAP ACID cache patch does today, which
> looks the cache up on the base file and applies local transaction state per
> query (i.e valid txns list which hides the committed deletes from an older
> query).
>
> >  I don’t follow your last comment about ROW__ID being projected out to
> the
> > user.  ORC isn’t currently hiding that field from the reader is it?
>
> In general, a BI tool of some kind over ACID probably cares about the data
> and not the metadata about which rows belong to which transaction in
> general.
>
> Hiding ROW__ID makes the consumer side of the reader identical between
> ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.
>
> Cheers,
> Gopal
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message