flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2716) Checksum method for DataSet and Graph
Date Thu, 17 Dec 2015 15:49:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062215#comment-15062215
] 

ASF GitHub Bot commented on FLINK-2716:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1462#discussion_r47921105
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ---
    @@ -394,6 +396,21 @@ public long count() throws Exception {
     		return res.<Long> getAccumulatorResult(id);
     	}
     
    +	/**
    +	 * Convenience method to get the count (number of elements) of a DataSet
    +	 * as well as the checksum (sum over element hashes).
    +	 *
    +	 * @return A Checksum that represents the count and checksum of elements in the data
set.
    +	 */
    +	public Checksum checksum() throws Exception {
    +		final String id = new AbstractID().toString();
    +
    +		flatMap(new Utils.ChecksumHelper<T>(id)).name("checksum()")
    +				.output(new DiscardingOutputFormat<NullValue>()).name("checksum() sink");
    --- End diff --
    
    Saves one operator and source of confusion in the UI. Actually, the `collect()` and `count()`
should be similarly simplified, come to think of it ;-)


> Checksum method for DataSet and Graph
> -------------------------------------
>
>                 Key: FLINK-2716
>                 URL: https://issues.apache.org/jira/browse/FLINK-2716
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly, Java API, Scala API
>    Affects Versions: 0.10.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and {{Graph.numberOfEdges()}} provide
measures of the number of distributed data elements. New {{DataSet.checksum()}} and {{Graph.checksum()}}
methods will summarize the content of data elements and support algorithm validation, integration
testing, and benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message