Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 30D68200B68 for ; Fri, 19 Aug 2016 15:15:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2F84F160A8E; Fri, 19 Aug 2016 13:15:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 72858160A79 for ; Fri, 19 Aug 2016 15:15:23 +0200 (CEST) Received: (qmail 48562 invoked by uid 500); 19 Aug 2016 13:15:22 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 48517 invoked by uid 99); 19 Aug 2016 13:15:22 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2016 13:15:22 +0000 Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id F131B1A00D6 for ; Fri, 19 Aug 2016 13:15:21 +0000 (UTC) Received: by mail-wm0-f50.google.com with SMTP id i5so40641814wmg.0 for ; Fri, 19 Aug 2016 06:15:21 -0700 (PDT) X-Gm-Message-State: AEkooutPAII78djImyjPQHn8HpLNOpeUn7ENMXEijbS0//hcsEh8GbX5xFI3jAFsGA3Y13Xuv0bu9Ok7xfUuRA== X-Received: by 10.194.118.38 with SMTP id kj6mr6280453wjb.181.1471612520687; Fri, 19 Aug 2016 06:15:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.75.36 with HTTP; Fri, 19 Aug 2016 06:15:00 -0700 (PDT) In-Reply-To: References: From: Robert Metzger Date: Fri, 19 Aug 2016 15:15:00 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Compress DataSink Output To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=001a1130c92e8b3a3b053a6c7d6e archived-at: Fri, 19 Aug 2016 13:15:24 -0000 --001a1130c92e8b3a3b053a6c7d6e Content-Type: text/plain; charset=UTF-8 Hi Wes, Flink's own OutputFormats don't support compression, but we have some tools to use Hadoop's OutputFormats with Flink [1], and those support compression: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html Let me know if you need more information. Regards, Robert [1]: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr wrote: > Hello - > > Forgive me if this has been asked before, but I'm trying to determine the > best way to add compression to DataSink Outputs (starting with > TextOutputFormat). Realistically I would like each partition file (based > on parallelism) to be compressed independently with gzip, but am open to > other solutions. > > My first thought was to extend TextOutputFormat with a new class that > compresses after closing and before returning, but I'm not sure that would > work across all possible file systems (S3, Local, and HDFS). > > Any thoughts? > > Thanks! > > Wes > > > --001a1130c92e8b3a3b053a6c7d6e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Wes,

Flink's own OutputFormats d= on't support compression, but we have some tools to use Hadoop's Ou= tputFormats with Flink [1], and those support compression:=C2=A0https://hadoop.apache.org/docs/stable/api/org/= apache/hadoop/mapreduce/lib/output/FileOutputFormat.html

=
Let me know if you need more information.

Regards,
Robert



On Thu, Aug 18, 2016 at 2:1= 3 AM, Wesley Kerr <wesley.n.kerr@gmail.com> wrote:
=
Hello -=C2=A0

Forgive me if this has been asked before, but I'm trying to deter= mine the best way to add compression to DataSink Outputs (starting with TextOutputFormat).=C2=A0 Realistically I would like eac= h partition file (based on parallelism) to be compressed independently with= gzip, but am open to other solutions.

My first= thought was to extend TextOutputFormat with a new class that compresses af= ter closing and before returning, but I'm not sure that would work acro= ss all possible file systems (S3, Local, and HDFS).

Any thoughts?

Thanks!

Wes



--001a1130c92e8b3a3b053a6c7d6e--