Thanks to both of you, I've actually went ahead with implementing FileSystemAPI following this util: https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/util/StreamWrapperFileSystem.java
I think it would be awesome to have ORC separated from hadoop class eventually as I have to pull those jars as dependency and of course there are multiple layers of indirection here.

On Fri, Jan 22, 2021 at 10:21 AM Owen O'Malley <owen.omalley@gmail.com> wrote:
Ok, a couple of things:
  • The PhysicalWriter was intended so that LLAP could implement a write through cache where the new file was put into the cache as well as written to long term storage.
  • The Hadoop FileSystem API, which is what ORC currently uses, is extensible and has a lot of bindings other than HDFS. For your use case, you probably want to use "file:///my-dir/my.orc"
  • Somewhere in the unit tests there is an implementation of Hadoop FileSystem that uses ByteBuffers in memory.
  • Finally, over the years there has been an ask for using ORC core without having Hadoop on the class path. Let me take a pass at that today to see if I can make that work. See https://issues.apache.org/jira/browse/ORC-508 .
.. Owen

On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy <andrey.elenskiy@arista.com> wrote:
Hello, currently there's only a single implementation of PhysicalWriter that I were able to find -- PhysicalFSWriter, which only gives the option to write to HDFS.

I'd like to reuse the ORC file format for my own purposes without the destination being HDFS, but just some byte buffer where I can decide myself where the bytes end up being saved.

I've started implementing PhysicalWriter, but it seems like a lot of it just ends up being copied over from PhysicalFSWriter which seems redundant. So, I'm wondering if maybe something already exists to achieve my goal of just writing resulting columns to DataOutputStream (maybe there's some unofficial Java library or I'm missing some obvious official API).

Thanks,
Andrey