flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kruse, Sebastian" <Sebastian.Kr...@hpi.de>
Subject RE: Gzip support
Date Mon, 04 May 2015 08:37:43 GMT
Right, I saw the .deflate file support und the unsplittable flag and built upon that code.
I just tried to generalize it and expose it as a hook, so that unforeseen issues like new
exotic compression formats or handling custom preambles can be implemented by the users themselves.
I can create a ticket and a pull request by this week, so that you can have a look at it.

Cheers,
Sebastian
________________________________________
From: Robert Metzger [metrobert@gmail.com]
Sent: Thursday, April 30, 2015 21:01
To: dev@flink.apache.org
Subject: Re: Gzip support

There is already support for inflate compressed files and I introduced logic to handle unsplittable
formats.


Sent from my iPhone

> On 30.04.2015, at 19:39, Stephan Ewen <sewen@apache.org> wrote:
>
> I think that would be very worthwhile :-) Happy to hear that you want to
> contribute that!
>
> Decorating the input stream sounds like a great approach and would also
> work for other compression formats.
>
> The other thing that needs to be taken into account is that GZIP files are
> not splittable in the same way as uncompressed files. You may have to
> invent something clever there, or simply restrict the format to have one
> input split per file (rather than block).
>
> On Thu, Apr 30, 2015 at 5:41 PM, Kruse, Sebastian <Sebastian.Kruse@hpi.de>
> wrote:
>
>> Hi everyone,
>>
>> I just recently came across a use-case where I needed to read gzip files
>> and handle byte order marks transparently. I know that gzip can be read
>> with Hadoop input formats but that did not work for me since I wanted to
>> reuse my existing custom Flink input formats.
>>
>> It turned out that both requirements (and more) can be dealt with by
>> allowing the input formats to decorate the input stream. Do you think it is
>> worthwhile to include these changes in Flink? I could take care of it.
>>
>> Cheers,
>> Sebastian
>>

Mime
View raw message