flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1963) Improve distinct() transformation
Date Tue, 14 Jul 2015 22:21:07 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627150#comment-14627150
] 

ASF GitHub Bot commented on FLINK-1963:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/905#discussion_r34627102
  
    --- Diff: flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/DistinctITCase.java
---
    @@ -274,4 +277,81 @@ public Integer map(POJO value) throws Exception {
     			return (int) value.nestedPojo.longNumber;
     		}
     	}
    +
    +	@Test
    +	public void testCorrectnessOfDistinctOnAtomic() throws Exception {
    +    	/*
    +     	* check correctness of distinct on Integers
    +     	*/
    +
    +		final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +		DataSet<Integer> ds = CollectionDataSets.getIntegerDataSet(env);
    +		DataSet<Integer> reduceDs = ds.distinct();
    +
    +		List<Integer> result = reduceDs.collect();
    +
    +		String expected = "1\n2\n3\n4\n5";
    +
    +		compareResultAsText(result, expected);
    +	}
    +
    +	@Test
    +	public void testCorrectnessOfDistinctOnAtomicWithSelectAllChar() throws Exception {
    +    	/*
    +     	* check correctness of distinct on Strings, using Keys.ExpressionKeys.SELECT_ALL_CHAR
    +     	*/
    +
    +		final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +		DataSet<String> ds = CollectionDataSets.getStringDataSet(env);
    +		DataSet<String> reduceDs = ds.union(ds).distinct("*");
    +
    +		List<String> result = reduceDs.collect();
    +
    +		String expected = "I am fine.\n" +
    +				"Luke Skywalker\n" +
    +				"LOL\n" +
    +				"Hello world, how are you?\n" +
    +				"Hi\n" +
    +				"Hello world\n" +
    +				"Hello\n" +
    +				"Random comment\n";
    +
    +		compareResultAsText(result, expected);
    +	}
    +
    +	@Test(expected = InvalidProgramException.class)
    +	public void testDistinctOnNotKeyDataType() throws Exception {
    --- End diff --
    
    We usually add such failing tests to the unit tests (DistinctOperatorTest.java) instead
of integration tests.


> Improve distinct() transformation
> ---------------------------------
>
>                 Key: FLINK-1963
>                 URL: https://issues.apache.org/jira/browse/FLINK-1963
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.9
>            Reporter: Fabian Hueske
>            Assignee: pietro pinoli
>            Priority: Minor
>              Labels: starter
>             Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to processing
atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple), but wildcard
expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message