avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-867) Allow tools to read files via hadoop FileSystem class
Date Thu, 28 Jul 2011 21:04:09 GMT

    [ https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072530#comment-13072530
] 

Doug Cutting commented on AVRO-867:
-----------------------------------

If DataFileReader were to incorporate this, then the core Avro pom might depend on Hadoop.
 Some have complained about this before, since Hadoop depends on Avro, creating a circular
dependency.  (In practice this is not an issue as long as both provide some backwards compatibility.
 Avro can build against an older, published version of Hadoop and vice-versa.)

Perhaps this could be implemented using reflection, e.g., something like:

Class.forName("org.apache.hadoop.fs.FileSystem").getMethod("open").invoke(...)

That way it'd work if Hadoop is on the classpath, but would not require a dependency on Hadoop.

As a middle ground, Hadoop could be required for compilation but only used at runtime when
an HDFS URI is passed in.

Alternately, we might add a UriResolver interface and a base implementation that just works
for local files.  Then Avro's mapred module could add an implementation that supports HDFS
too.  The default factory might first look for an org.apache.avro.mapred.FileSystemResolver
class, and, if that doesn't exist, use the base implementation.

> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
>                 Key: AVRO-867
>                 URL: https://issues.apache.org/jira/browse/AVRO-867
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that are in HDFS,
S3, etc via the [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
api. We could retain backwards compatibility by assuming that unqualified urls are "file://"
but allow reading of files from fully qualified urls such as hdfs://. The required apis are
already part of the avro-tools uber jar to support the TetherTool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message