flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6776) Use skip instead of seek for small forward repositioning in DFS streams
Date Tue, 13 Jun 2017 09:54:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047668#comment-16047668

ASF GitHub Bot commented on FLINK-6776:

Github user StefanRRichter commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java
    @@ -89,4 +99,14 @@ public long skip(long n) throws IOException {
     	public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream() {
     		return fsDataInputStream;
    +	public void forceSeek(long seekPos) throws IOException {
    --- End diff --
    I agree that doc wouldn't hurt. This class as a whole was rather undocumented, but it
is also internal and user will only interact through`FSDataInputStream`, which is not exposing
those methods. Can write something anyways :)

> Use skip instead of seek for small forward repositioning in DFS streams
> -----------------------------------------------------------------------
>                 Key: FLINK-6776
>                 URL: https://issues.apache.org/jira/browse/FLINK-6776
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Minor
> Reading checkpoint meta data and finding key-groups in restores sometimes require to
seek in input streams. Currently, we always use a seek, even for small position changes. As
small true seeks are far more expensive than small reads/skips, we should just skip over small
gaps instead of performing the seek.

This message was sent by Atlassian JIRA

View raw message