flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Wrong and non consistent behavior of max
Date Fri, 28 Nov 2014 13:53:02 GMT
This is not the first time that people confused this. I think most people
expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you
call the aggregate method and specify as an argument, which type of
aggregation should be performed. I didn't really like this, but if the
current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <
alber.maximilian@gmail.com> wrote:

> Hi Fabian!
>
> Ok, thanks! Now it works.
>
> Cheers,
> Max
>
> On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fhueske@apache.org> wrote:
>
>> Hi Max,
>>
>> the max(i) function does not select the Tuple with the maximum value.
>> Instead, it builds a new Tuple with the maximum value for the i-th
>> attribute. The values of the Tuple's other fields are not defined (in
>> practice they are set to the value of the last Tuple, however the order of
>> Tuples is not defined).
>>
>> The Java API features minBy and maxBy transformations that should do what
>> you are looking for.
>> You can reimplement them for Scala as a simple GroupReduce (or Reduce)
>> function or use the Java function in you Scala code.
>>
>> Best, Fabian
>>
>>
>>
>> 2014-11-27 16:14 GMT+01:00 Maximilian Alber <alber.maximilian@gmail.com>:
>>
>>> Hi Flinksters,
>>>
>>> I don't if I made something wrong, but the code seems fine. Basically
>>> the max function does extract a wrong element.
>>>
>>> The error does just happen with my real data, not if I inject some
>>> sequence into costs.
>>>
>>> The problem is that the according tuple value at position is wrong. The
>>> maximum of the second part is detected correctly.
>>>
>>> The code snippet:
>>>
>>> val maxCost = costs map {x => (x.id, x.value)} max(1)
>>>
>>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ })
>>> union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText
>>> config.outFile
>>>
>>> The output:
>>>
>>> File content:
>>> first: (47,42.066986)
>>> first: (11,4.448255)
>>> first: (40,42.06696)
>>> first: (3,0.96731037)
>>> first: (31,42.06443)
>>> first: (18,23.753584)
>>> first: (45,42.066986)
>>> first: (24,41.44347)
>>> first: (13,6.1290965)
>>> first: (19,26.42948)
>>> first: (1,0.9665109)
>>> first: (28,42.04222)
>>> first: (5,1.2986814)
>>> first: (44,42.066986)
>>> first: (7,1.8681992)
>>> first: (10,3.0981758)
>>> first: (41,42.066982)
>>> first: (48,42.066986)
>>> first: (21,33.698544)
>>> first: (38,42.066963)
>>> first: (30,42.06153)
>>> first: (26,41.950237)
>>> first: (43,42.066986)
>>> first: (16,14.754578)
>>> first: (15,10.571205)
>>> first: (34,42.06672)
>>> first: (29,42.055424)
>>> first: (35,42.066845)
>>> first: (8,1.9513339)
>>> first: (22,38.189228)
>>> first: (46,42.066986)
>>> first: (2,0.966511)
>>> first: (27,42.013676)
>>> first: (12,5.4271784)
>>> first: (42,42.066986)
>>> first: (4,1.01561)
>>> first: (14,7.4410205)
>>> first: (25,41.803535)
>>> first: (6,1.6827519)
>>> first: (36,42.06694)
>>> first: (20,28.834095)
>>> first: (32,42.06577)
>>> first: (49,42.066986)
>>> first: (33,42.0664)
>>> first: (9,2.2420964)
>>> first: (37,42.066967)
>>> first: (0,0.9665109)
>>> first: (17,19.016153)
>>> first: (39,42.06697)
>>> first: (23,40.512672)
>>> second: (23,42.066986)
>>>
>>> File content end.
>>>
>>>
>>> Thanks!
>>> Cheers,
>>> Max
>>>
>>>
>>
>

Mime
View raw message