flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Sort tuple dataset
Date Sun, 15 Mar 2015 16:38:13 GMT
After building flink 0.9-SNAPSHOT from source and using
DataSet.sortPartition is indeed working as expected.

This is fine but raises the question on how to go about sorting in 0.8.1?





On Sun, Mar 15, 2015 at 5:05 PM, Kristoffer Sjögren <stoffe@gmail.com>
wrote:

> That's the thing, there is no DataSet.sortPartition method in 0.8.1.
> Looking through the git history show that sortPartition was added 20th of
> February so I think that's 0.9-SNAPSHOT?
>
>
> On Sun, Mar 15, 2015 at 4:51 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi!
>>
>> I think sort partition is the right think, if you have only one partition
>> (which makes sense, if you want a total order). It is not a parallel
>> operation any mode, so use it only after the data size has been reduced
>> (filters / aggregations).
>>
>> What about "data.sortPartition().setParallelism(1)".
>>
>> Does that work for you?
>>
>> Greetings,
>> Stephan
>>
>>
>> On Sun, Mar 15, 2015 at 4:47 PM, Kristoffer Sjögren <stoffe@gmail.com>
>> wrote:
>>
>>> Thanks for your answer. I guess i'm a bit infected by writing to much
>>> Crunch code and I also suspected that getDataSet() was the wrong thing to
>>> do :-)
>>>
>>> However I was expecting DataSet.sortPartition to do the sorting, but
>>> this method is missing in 0.8.1?
>>>
>>> Do you have a minimal example? I was looking through the tests but most
>>> of them use sortPartition.
>>>
>>> Cheers,
>>> -Kristoffer
>>>
>>>
>>>
>>> On Sun, Mar 15, 2015 at 4:22 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> Hi Kristoffer!
>>>>
>>>> There are a few issues with that code:
>>>>
>>>> 1) Grouping and then calling "sort group" sorts within the group. In
>>>> your case, you group after the entire element and each group has on value
-
>>>> the element. Sorting inside the group does not make any difference. There
>>>> is no order across groups.
>>>>
>>>> 2) This code never groups and sorts. The calls to "groupBy(0).sortGroup(0,
>>>> Order.DESCENDING)." do not group and sort already, they set up a grouping
>>>> to be used with a reduce or aggregate function. The "getDataSet()" call
>>>> gets you the original data set, which is the original input.
>>>>
>>>> To see an illustration of this, get the program plan
>>>> (env.getExecutionPlan()). You can render it using the html file
>>>> "tools/planVisualizer.html".
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Sun, Mar 15, 2015 at 3:29 PM, Kristoffer Sjögren <stoffe@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> This is silly but I can't understand why the following code doesn't
>>>>> sort the collection of integers. It seems to be reasonable thing to do
from
>>>>> an API perspective?
>>>>>
>>>>> Cheers,
>>>>> -Kristoffer
>>>>>
>>>>> final ExecutionEnvironment env =
>>>>> ExecutionEnvironment.getExecutionEnvironment();
>>>>>     env.fromCollection(Lists.newArrayList(2,1,5,3,4,5)).map(new
>>>>> MapFunction<Integer, Tuple1<Integer>>() {
>>>>>       @Override
>>>>>       public Tuple1<Integer> map(Integer value) throws Exception
{
>>>>>         return new Tuple1(value);
>>>>>       }
>>>>>     }).groupBy(0).sortGroup(0, Order.DESCENDING).getDataSet().print();
>>>>>     env.execute();
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message