flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Gzip support
Date Mon, 04 May 2015 11:47:08 GMT
Great. Please file a JIRA and open a pull request for the feature!

On Mon, May 4, 2015 at 10:37 AM, Kruse, Sebastian <Sebastian.Kruse@hpi.de>
wrote:

> Right, I saw the .deflate file support und the unsplittable flag and built
> upon that code. I just tried to generalize it and expose it as a hook, so
> that unforeseen issues like new exotic compression formats or handling
> custom preambles can be implemented by the users themselves.
> I can create a ticket and a pull request by this week, so that you can
> have a look at it.
>
> Cheers,
> Sebastian
> ________________________________________
> From: Robert Metzger [metrobert@gmail.com]
> Sent: Thursday, April 30, 2015 21:01
> To: dev@flink.apache.org
> Subject: Re: Gzip support
>
> There is already support for inflate compressed files and I introduced
> logic to handle unsplittable formats.
>
>
> Sent from my iPhone
>
> > On 30.04.2015, at 19:39, Stephan Ewen <sewen@apache.org> wrote:
> >
> > I think that would be very worthwhile :-) Happy to hear that you want to
> > contribute that!
> >
> > Decorating the input stream sounds like a great approach and would also
> > work for other compression formats.
> >
> > The other thing that needs to be taken into account is that GZIP files
> are
> > not splittable in the same way as uncompressed files. You may have to
> > invent something clever there, or simply restrict the format to have one
> > input split per file (rather than block).
> >
> > On Thu, Apr 30, 2015 at 5:41 PM, Kruse, Sebastian <
> Sebastian.Kruse@hpi.de>
> > wrote:
> >
> >> Hi everyone,
> >>
> >> I just recently came across a use-case where I needed to read gzip files
> >> and handle byte order marks transparently. I know that gzip can be read
> >> with Hadoop input formats but that did not work for me since I wanted to
> >> reuse my existing custom Flink input formats.
> >>
> >> It turned out that both requirements (and more) can be dealt with by
> >> allowing the input formats to decorate the input stream. Do you think
> it is
> >> worthwhile to include these changes in Flink? I could take care of it.
> >>
> >> Cheers,
> >> Sebastian
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message