orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: [Java] PhysicalWriter to DataOutputStream implementation?
Date Fri, 22 Jan 2021 18:20:55 GMT
Ok, a couple of things:

   - The PhysicalWriter was intended so that LLAP could implement a write
   through cache where the new file was put into the cache as well as written
   to long term storage.
   - The Hadoop FileSystem API, which is what ORC currently uses, is
   extensible and has a lot of bindings other than HDFS. For your use case,
   you probably want to use "file:///my-dir/my.orc"
   - Somewhere in the unit tests there is an implementation of Hadoop
   FileSystem that uses ByteBuffers in memory.
   - Finally, over the years there has been an ask for using ORC core
   without having Hadoop on the class path. Let me take a pass at that today
   to see if I can make that work. See
   https://issues.apache.org/jira/browse/ORC-508 .

.. Owen

On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy <andrey.elenskiy@arista.com>

> Hello, currently there's only a single implementation of PhysicalWriter
> that I were able to find -- PhysicalFSWriter, which only gives the option
> to write to HDFS.
> I'd like to reuse the ORC file format for my own purposes without the
> destination being HDFS, but just some byte buffer where I can decide myself
> where the bytes end up being saved.
> I've started implementing PhysicalWriter, but it seems like a lot of it
> just ends up being copied over from PhysicalFSWriter which seems redundant.
> So, I'm wondering if maybe something already exists to achieve my goal of
> just writing resulting columns to DataOutputStream (maybe there's some
> unofficial Java library or I'm missing some obvious official API).
> Thanks,
> Andrey

View raw message