orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Orc Acid?
Date Tue, 29 Jan 2019 17:40:00 GMT
To answer the original question, it's split between the two.  The storage
requires a new column that records transaction id, row id, and some other
information.  To read ACID data integration with the Hive metastore is
required so that the reader understands which records are valid and which
are not.  Writers also need to access the metastore to open and commit
transactions for any new records they write.

Shant's comment that the work is mostly in Hive at this point is true.  I
started work on porting the storage piece into Orc in
https://issues.apache.org/jira/projects/ORC/issues/ORC-255  You can see the
progress I made at https://github.com/alanfgates/orc/tree/orc255
The patch is a year out of date so probably needs some help.  In particular
it needs to be in sync with what Hive is doing.  And I was only focusing on
the vector batch interface not the row-by-row one, which may or may not be
what interests you.  I suspect Hive will continue to want to go under the
covers and access things directly in ORC, but some kind of interface or
contract needs to be worked out to keep ORC readers and the Hive reader in
sync.

Alan.

On Mon, Jan 28, 2019 at 8:37 PM Shant Hovsepian <shant@arcadiadata.com>
wrote:

> ORC ACID is more of a Hive feature than an ORC feature.
>
> Regretfully it's not defined in a engine agnostic way. Would be great to
> make the ACID layout part of the file format definition or as a generic
> container definition or an extension to the Hive table format, so it would
> be easier to use across tools. It's especially troubling that ACID is on by
> default in HDP 3.X for Hive 3.1. Makes it very hard to read Hive generated
> ORC files unless the table is created as an external table instead of a
> managed table.
>
> -Shant
>
> On Mon, Jan 28, 2019 at 11:06 PM Jacques Nadeau <jacques@apache.org>
> wrote:
>
> > How much of the Acid functionality of Orc is actually in the Orc project?
> > The website seems to suggest it is core to Orc but a quick glance at the
> > code and it seems like really the code is mostly elsewhere?
> >
> > Thanks
> > Jacques
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message