avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1862) AvroOutputFormat saves compressed avrò files without respecting codec's default extension
Date Mon, 15 May 2017 12:43:04 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010427#comment-16010427
] 

Niels Basjes commented on AVRO-1862:
------------------------------------

If I create a tar archive and the compress it with gzip I get a name like {{example.tar.gz}}.

If I gunzip that file I actually get a {{example.tar}} which is a tar archive.
Avro files are Avro files.
You cannot 'unavro' a {{example.gz.avro}} file and get a {{example.gz}} file.

The other way around is also wrong using a name like {{example.avro.gz}} would lead to the
expectation that it is a gzipped file and if you ungzip it you get a {{example.avro}}.

Based on the explanation I interpret the reason behind this change as a workaround for a need
in scripting to avoid certain situations.
Alternative solution for the described use case: Run Camus in a script that after running
the task simply renames the output file {{example.avro}} to {{example.camus.avro}}

I see this as a problem that does not belong to the avro code base.
So based on what I see here I think this should not be committed.



> AvroOutputFormat saves compressed avrò files without respecting codec's default extension
> -----------------------------------------------------------------------------------------
>
>                 Key: AVRO-1862
>                 URL: https://issues.apache.org/jira/browse/AVRO-1862
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.8.1
>            Reporter: Piotr Wikieł
>            Priority: Minor
>              Labels: patch
>             Fix For: 1.8.3
>
>         Attachments: AVRO-1862-1.patch, AVRO-1862.patch
>
>
> Common pattern in naming compressed files is giving them extension derived from compression
codec, for example: {{.gz}}, {{.zip}}, {{.bz2}}. 
> {{AvroOutputFormat}} currently does not respect this convention. 
> I've adapted some code from Hadoop's {{TextOutputFormat}} in backward-compatible manner
adding following {{JobConf}} property:
> {{avro.mapred.output.extension.from-codec}} ({{boolean}}, default: {{false}}) - when
set to {{true}}, extension will be changed according to above rule.
> EDIT: Please take a look at first comment for an update. {{.gz.avro}}, {{.snappy.avro}}
will be an extension of the file when above property will be set to true.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message