flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StefanRRichter <...@git.apache.org>
Subject [GitHub] flink pull request #4019: [FLINK-6776] [runtime] Use skip instead of seek fo...
Date Tue, 13 Jun 2017 09:42:15 GMT
Github user StefanRRichter commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java
    @@ -31,11 +31,15 @@
     public final class HadoopDataInputStream extends FSDataInputStream {
    +	/** Minimum amount of bytes to skip forward before we issue a seek instead of discarding
read */
    +	private static final int MIN_SKIP_BYTES = 1024 * 1024;
    --- End diff --
    Right now, this is a purely "magic" number. The optimum should depend on the dfs and the
underlying fs. For now, this number is chosen "big enough" to provide improvements for smaller
seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should
be the page size, a true optimum per system would be the amounts of bytes the can be consumed
within seektime. Unfortunately, seektime is not constant and devices as well as dfs potentially
also use read buffers and read-ahead. In the long run this value could become configurable,
but for now I have simply chosen a conservative, relatively small value that should bring
safe improvements for small skips in meta data, that would hurt the most.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message