orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Elenskiy <andrey.elens...@arista.com>
Subject Re: [Java] PhysicalWriter to DataOutputStream implementation?
Date Fri, 22 Jan 2021 18:32:06 GMT
Thanks to both of you, I've actually went ahead with implementing
FileSystemAPI following this util:
https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/util/StreamWrapperFileSystem.java
I think it would be awesome to have ORC separated from hadoop class
eventually as I have to pull those jars as dependency and of course there
are multiple layers of indirection here.

On Fri, Jan 22, 2021 at 10:21 AM Owen O'Malley <owen.omalley@gmail.com>
wrote:

> Ok, a couple of things:
>
>    - The PhysicalWriter was intended so that LLAP could implement a write
>    through cache where the new file was put into the cache as well as written
>    to long term storage.
>    - The Hadoop FileSystem API, which is what ORC currently uses, is
>    extensible and has a lot of bindings other than HDFS. For your use case,
>    you probably want to use "file:///my-dir/my.orc"
>    - Somewhere in the unit tests there is an implementation of Hadoop
>    FileSystem that uses ByteBuffers in memory.
>    - Finally, over the years there has been an ask for using ORC core
>    without having Hadoop on the class path. Let me take a pass at that today
>    to see if I can make that work. See
>    https://issues.apache.org/jira/browse/ORC-508 .
>
> .. Owen
>
> On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy <
> andrey.elenskiy@arista.com> wrote:
>
>> Hello, currently there's only a single implementation of PhysicalWriter
>> that I were able to find -- PhysicalFSWriter, which only gives the option
>> to write to HDFS.
>>
>> I'd like to reuse the ORC file format for my own purposes without the
>> destination being HDFS, but just some byte buffer where I can decide myself
>> where the bytes end up being saved.
>>
>> I've started implementing PhysicalWriter, but it seems like a lot of it
>> just ends up being copied over from PhysicalFSWriter which seems redundant.
>> So, I'm wondering if maybe something already exists to achieve my goal of
>> just writing resulting columns to DataOutputStream (maybe there's some
>> unofficial Java library or I'm missing some obvious official API).
>>
>> Thanks,
>> Andrey
>>
>

Mime
View raw message