avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincenz Priesnitz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-867) Allow tools to read files via hadoop FileSystem class
Date Tue, 16 Apr 2013 09:21:16 GMT

     [ https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vincenz Priesnitz updated AVRO-867:
-----------------------------------

    Affects Version/s: 1.7.5
         Release Note: avro-tools can now access Hadoop supported filesystem when started
via hadoop jar.
               Status: Patch Available  (was: Open)

Attached you find a patch that changes the Utils class to use the hadoop FileSystem class.
It is now possible to use any supported filesystem for input or output files in more tools.


Without any configurations, the tools behave as before:
{noformat}
# reads from local file system by default
# supports relative paths
java -jar avro-tools-1.7.5.jar tojson ~/myDir/myData.avro
{noformat}

If invoked via hadoop jar, the tools support more filesystems. Different filesystems can be
used in a single call. Furthermore, any default filesystem that might be specified in core-site.xml
is respected.
{noformat}
# combines an ftp file and a local file and writes result file combinedData.avro directly
on the default hdfs server.
hadoop jar avro-tools-1.7.5.jar concat ftp://myFtpServer/data1.avro file:///home/user/data2.avro
combinedData.avro
{noformat}

Now it is possible to take a look at remote files quicker, e.g.:
{noformat}
hadoop jar avro-Tools getschema Data_on_hdfs.avro
hadoop jar avro-Tools tojson ftp://server-address/Data_on_ftp.avro 
{noformat}

The following tools now use Utils for accessing files: concat, fragtojson, fromjson, fromtext,
getmeta, getschema, jsontofrag, recodec, tojson, totext.
                
> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
>                 Key: AVRO-867
>                 URL: https://issues.apache.org/jira/browse/AVRO-867
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that are in HDFS,
S3, etc via the [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
api. We could retain backwards compatibility by assuming that unqualified urls are "file://"
but allow reading of files from fully qualified urls such as hdfs://. The required apis are
already part of the avro-tools uber jar to support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message