crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Compress and output formats
Date Sun, 13 Sep 2015 06:15:10 GMT
On Sat, Sep 12, 2015 at 2:35 PM, Everett Anderson <everett@nuna.com> wrote:

> Hi,
>
> I've got two basic questions about org.apache.crunch.io.Compress
> <https://crunch.apache.org/apidocs/0.12.0/index.html?overview-summary.html>
> .
>
> 1) It seems like it should only be used to wrap Targets that are
> themselves binary file output formats, but org.apache.crunch.io.To only
> has text, avro, and sequence, none of which seem appropriate. How do people
> tend to use this? Is there a Hadoop FileOutputFormat that they give to
> To.formattedFile?
>

I don't understand the question-- the Compress methods can be used for any
sort of output format that extends FileOutputFormat, it doesn't matter
whether it's text/sequence/avro or a custom thing.

>
> 2) The implementation of Compress.gzip is
>
>   public static <T extends Target> T gzip(T target) {
>     return (T) compress(target, GzipCodec.class)
>         .outputConf(*AvroJob.OUTPUT_CODEC*,
> DataFileConstants.DEFLATE_CODEC);
>   }
>
> Does this mean it can only work with Avro?
>

No, it's just that Avro has its own built-in support for gzip/snappy
serialization and it requires some extra conf to enable it. Any other
output format will just ignore that configuration parameter.


> Thanks!
>
> *DISCLAIMER:* The contents of this email, including any attachments, may
> contain information that is confidential, proprietary in nature, protected
> health information (PHI), or otherwise protected by law from disclosure,
> and is solely for the use of the intended recipient(s). If you are not the
> intended recipient, you are hereby notified that any use, disclosure or
> copying of this email, including any attachments, is unauthorized and
> strictly prohibited. If you have received this email in error, please
> notify the sender of this email. Please delete this and all copies of this
> email from your system. Any opinions either expressed or implied in this
> email and all attachments, are those of its author only, and do not
> necessarily reflect those of Nuna Health, Inc.

Mime
View raw message