flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Compress DataSink Output
Date Fri, 19 Aug 2016 13:15:00 GMT
Hi Wes,

Flink's own OutputFormats don't support compression, but we have some tools
to use Hadoop's OutputFormats with Flink [1], and those support
compression:
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html

Let me know if you need more information.

Regards,
Robert

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html


On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr <wesley.n.kerr@gmail.com>
wrote:

> Hello -
>
> Forgive me if this has been asked before, but I'm trying to determine the
> best way to add compression to DataSink Outputs (starting with
> TextOutputFormat).  Realistically I would like each partition file (based
> on parallelism) to be compressed independently with gzip, but am open to
> other solutions.
>
> My first thought was to extend TextOutputFormat with a new class that
> compresses after closing and before returning, but I'm not sure that would
> work across all possible file systems (S3, Local, and HDFS).
>
> Any thoughts?
>
> Thanks!
>
> Wes
>
>
>

Mime
View raw message