flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1981) Add GZip support
Date Tue, 02 Jun 2015 19:39:51 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569625#comment-14569625

ASF GitHub Bot commented on FLINK-1981:

Github user sekruse commented on a diff in the pull request:

    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
    @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException {
     	 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
     	protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit
fileSplit) throws Throwable {
    -		// Wrap stream in a extracting (decompressing) stream if file ends with .deflate.
    -		if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
    -			return new InflaterInputStreamFSInputWrapper(stream);
    +		// Wrap stream in a extracting (decompressing) stream if file ends with a known compression
file extension.
    +		InflaterInputStreamFactory<?> inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath());
    +		if (inflaterInputStreamFactory != null) {
    +			return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
    --- End diff --
    It might also be the case that the stream was not compressed at all. It would of course
be nice to react appropriately to a missing codec, but how would we know if the current input
split belongs to an uncompressed file or a compressed file with an unknown codec?

> Add GZip support
> ----------------
>                 Key: FLINK-1981
>                 URL: https://issues.apache.org/jira/browse/FLINK-1981
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sebastian Kruse
>            Assignee: Sebastian Kruse
>            Priority: Minor
> GZip, as a commonly used compression format, should be supported in the same way as the
already supported deflate files. This allows to use GZip files with any subclass of FileInputFormat.

This message was sent by Atlassian JIRA

View raw message