orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismaël Mejía (JIRA) <j...@apache.org>
Subject [jira] [Created] (ORC-508) Add a reader/writer that do not depend on Hadoop FileSystem
Date Mon, 27 May 2019 11:49:00 GMT
Ismaël Mejía created ORC-508:
--------------------------------

             Summary: Add a reader/writer that do not depend on Hadoop FileSystem
                 Key: ORC-508
                 URL: https://issues.apache.org/jira/browse/ORC-508
             Project: ORC
          Issue Type: Improvement
          Components: Java
            Reporter: Ismaël Mejía


It seems that the default implementation classes of Orc today depend on Hadoop FS objects
to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking
a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with
a more generic abstraction that relies on Java's Channels and Streams APIs. That delegate
directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice
to have such support in the core implementation and to maybe split the hadoop depending implementation
into its own module in the future.

 

 

After a look at some parts of the `orc-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message