Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BC84C0E3 for ; Fri, 28 Nov 2014 13:54:12 +0000 (UTC) Received: (qmail 8415 invoked by uid 500); 28 Nov 2014 13:54:12 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 8344 invoked by uid 500); 28 Nov 2014 13:54:11 -0000 Mailing-List: contact user-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.incubator.apache.org Delivered-To: mailing list user@flink.incubator.apache.org Received: (qmail 8334 invoked by uid 99); 28 Nov 2014 13:54:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 13:54:11 +0000 X-ASF-Spam-Status: No, hits=-1997.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 28 Nov 2014 13:53:49 +0000 Received: (qmail 8283 invoked by uid 99); 28 Nov 2014 13:53:46 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 13:53:46 +0000 Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C387E1A0019 for ; Fri, 28 Nov 2014 13:53:31 +0000 (UTC) Received: by mail-lb0-f176.google.com with SMTP id p9so5544748lbv.21 for ; Fri, 28 Nov 2014 05:53:43 -0800 (PST) X-Gm-Message-State: ALoCoQnvSRL6vhKUoGDsOqafVnxw5XcMYhBzZIFq0sC6f/8txrTCrcTID+gJkPSMJPy6oKjEPJqO X-Received: by 10.152.206.67 with SMTP id lm3mr43908743lac.16.1417182823254; Fri, 28 Nov 2014 05:53:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.134.131 with HTTP; Fri, 28 Nov 2014 05:53:02 -0800 (PST) In-Reply-To: References: From: Ufuk Celebi Date: Fri, 28 Nov 2014 14:53:02 +0100 Message-ID: Subject: Re: Wrong and non consistent behavior of max To: user@flink.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113494b2c3248e0508eb96ab X-Virus-Checked: Checked by ClamAV on apache.org --001a113494b2c3248e0508eb96ab Content-Type: text/plain; charset=UTF-8 This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min. Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again. On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber < alber.maximilian@gmail.com> wrote: > Hi Fabian! > > Ok, thanks! Now it works. > > Cheers, > Max > > On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske wrote: > >> Hi Max, >> >> the max(i) function does not select the Tuple with the maximum value. >> Instead, it builds a new Tuple with the maximum value for the i-th >> attribute. The values of the Tuple's other fields are not defined (in >> practice they are set to the value of the last Tuple, however the order of >> Tuples is not defined). >> >> The Java API features minBy and maxBy transformations that should do what >> you are looking for. >> You can reimplement them for Scala as a simple GroupReduce (or Reduce) >> function or use the Java function in you Scala code. >> >> Best, Fabian >> >> >> >> 2014-11-27 16:14 GMT+01:00 Maximilian Alber : >> >>> Hi Flinksters, >>> >>> I don't if I made something wrong, but the code seems fine. Basically >>> the max function does extract a wrong element. >>> >>> The error does just happen with my real data, not if I inject some >>> sequence into costs. >>> >>> The problem is that the according tuple value at position is wrong. The >>> maximum of the second part is detected correctly. >>> >>> The code snippet: >>> >>> val maxCost = costs map {x => (x.id, x.value)} max(1) >>> >>> (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) >>> union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText >>> config.outFile >>> >>> The output: >>> >>> File content: >>> first: (47,42.066986) >>> first: (11,4.448255) >>> first: (40,42.06696) >>> first: (3,0.96731037) >>> first: (31,42.06443) >>> first: (18,23.753584) >>> first: (45,42.066986) >>> first: (24,41.44347) >>> first: (13,6.1290965) >>> first: (19,26.42948) >>> first: (1,0.9665109) >>> first: (28,42.04222) >>> first: (5,1.2986814) >>> first: (44,42.066986) >>> first: (7,1.8681992) >>> first: (10,3.0981758) >>> first: (41,42.066982) >>> first: (48,42.066986) >>> first: (21,33.698544) >>> first: (38,42.066963) >>> first: (30,42.06153) >>> first: (26,41.950237) >>> first: (43,42.066986) >>> first: (16,14.754578) >>> first: (15,10.571205) >>> first: (34,42.06672) >>> first: (29,42.055424) >>> first: (35,42.066845) >>> first: (8,1.9513339) >>> first: (22,38.189228) >>> first: (46,42.066986) >>> first: (2,0.966511) >>> first: (27,42.013676) >>> first: (12,5.4271784) >>> first: (42,42.066986) >>> first: (4,1.01561) >>> first: (14,7.4410205) >>> first: (25,41.803535) >>> first: (6,1.6827519) >>> first: (36,42.06694) >>> first: (20,28.834095) >>> first: (32,42.06577) >>> first: (49,42.066986) >>> first: (33,42.0664) >>> first: (9,2.2420964) >>> first: (37,42.066967) >>> first: (0,0.9665109) >>> first: (17,19.016153) >>> first: (39,42.06697) >>> first: (23,40.512672) >>> second: (23,42.066986) >>> >>> File content end. >>> >>> >>> Thanks! >>> Cheers, >>> Max >>> >>> >> > --001a113494b2c3248e0508eb96ab Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
This is not the first time that people confused this.= I think most people expect the maxBy and minBy behaviour for max/min.
<= br>Maybe it makes sense to move back to the old aggregations API, where you= call the aggregate method and specify as an argument, which type of aggreg= ation should be performed. I didn't really like this, but if the curren= t state is confusing people, we should consider to change it again.

On Fri, N= ov 28, 2014 at 12:31 PM, Maximilian Alber <alber.maximilian@gmail= .com> wrote:
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On= Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <fhueske@apache.org>= wrote:
Hi Max,

the max(i) function does not select the Tuple with the= maximum value.
Instead, it builds a new Tuple with the maximum va= lue for the i-th attribute. The values of the Tuple's other fields are = not defined (in practice they are set to the value of the last Tuple, howev= er the order of Tuples is not defined).

The Java API features = minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduc= e) function or use the Java function in you Scala code.

Best, F= abian


2014-11-27 16:14 GMT+01:00 Maximilian Alber <alber.maximilian@gmail.com>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically = the max function does extract a wrong element.

The= error does just happen with my real data, not if I inject some sequence in= to costs.

The problem is that the according tuple = value at position is wrong. The maximum of the second part is detected corr= ectly.

The code snippet:

= val maxCost =3D costs map {x =3D> (x.id, x.value)} max(1)
=C2=A0 =C2=A0
(costs map {= x =3D> (x.id, x.value)} ma= p {_ toString} map {"first: "+ _ }) union (maxCost map {_ toStrin= g} map {"second: "+ _ }) writeAsText config.outFile
The output:

File content:
first: (4= 7,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0= .96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,4= 2.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26= .42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986= 814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.09817= 58)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.6985= 44)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.95023= 7)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.57120= 5)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845= )
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)=
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
f= irst: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
fir= st: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
firs= t: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: = (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (2= 3,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max




--001a113494b2c3248e0508eb96ab--