flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1287) Improve File Input Split assignment
Date Fri, 12 Dec 2014 13:43:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244150#comment-14244150
] 

ASF GitHub Bot commented on FLINK-1287:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/258#discussion_r21744330
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/LocatableInputSplitAssigner.java
---
    @@ -184,15 +209,159 @@ private static final boolean isLocal(String flinkHost, String[]
hosts) {
     				return true;
     			}
     		}
    -		
    +
     		return false;
     	}
    -	
    +
     	public int getNumberOfLocalAssignments() {
     		return localAssignments;
     	}
    -	
    +
     	public int getNumberOfRemoteAssignments() {
     		return remoteAssignments;
     	}
    +
    +    /**
    +     * Wraps a LocatableInputSplit and adds a count for the number of observed hosts
    +     * that can access the split locally.
    +     */
    +	public static class LocatableInputSplitWithCount {
    +
    +		private final LocatableInputSplit split;
    +		private int localCount;
    +
    +		public LocatableInputSplitWithCount(LocatableInputSplit split) {
    +			this.split = split;
    +			this.localCount = 0;
    +		}
    +
    +		public void incrementLocalCount() {
    +			this.localCount++;
    +		}
    +
    +		public int getLocalCount() {
    +			return this.localCount;
    +		}
    +
    +		public LocatableInputSplit getSplit() {
    +			return this.split;
    +		}
    +
    +	}
    +
    +	/**
    +	 * Holds a list of LocatableInputSplits and returns the split with the lowest local
count.
    +	 * The rational is that splits which are local on few hosts should be preferred over
others which
    +     * have more degrees of freedom for local assignment.
    --- End diff --
    
    indentation is off


> Improve File Input Split assignment
> -----------------------------------
>
>                 Key: FLINK-1287
>                 URL: https://issues.apache.org/jira/browse/FLINK-1287
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>            Reporter: Robert Metzger
>            Assignee: Fabian Hueske
>
> While running some DFS read-intensive benchmarks, I found that the assignment of input
splits is not optimal. In particular in cases where the numWorker != numDataNodes and when
the replication factor is low (in my case it was 1).
> In the particular example, the input had 40960 splits, of which 4694 were read remotely.
 Spark did only 2056 remote reads for the same dataset.
> With the replication factor increased to 2, Flink did only 290 remote reads. So usually,
users shouldn't be affected by this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message