Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90803CB1D for ; Wed, 21 Jan 2015 19:10:09 +0000 (UTC) Received: (qmail 39831 invoked by uid 500); 21 Jan 2015 18:58:13 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 22336 invoked by uid 500); 21 Jan 2015 18:57:48 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 5926 invoked by uid 99); 21 Jan 2015 17:32:24 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2015 17:32:24 +0000 Received: from uce.fritz.box (ip5b40324e.dynamic.kabel-deutschland.de [91.64.50.78]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id A596A1A003F for ; Wed, 21 Jan 2015 17:32:23 +0000 (UTC) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Implementing a list accumulator From: Ufuk Celebi In-Reply-To: <06892617-DA7F-414F-B0E0-7B00CB87F368@apache.org> Date: Wed, 21 Jan 2015 18:32:20 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <06892617-DA7F-414F-B0E0-7B00CB87F368@apache.org> To: dev@flink.apache.org X-Mailer: Apple Mail (2.1878.6) The thing is that the DefaultCrossFunction always uses the same holder = Tuple2 object, which is then handed over to the chained collect helper = flatMap(). I can see why it is OK to keep the default functions to reuse = "holder" objects, but when they are chained to an operator it becomes = problematic. On 21 Jan 2015, at 17:12, Ufuk Celebi wrote: > I just checked out your branch and ran it with a break point set at = the CollectHelper. If you look into the (list) accumulator you see that = always the same object is added to it. Strangely enough, object re-use = is disabled in the config. I don't have time to look further into it = now, but it seems to be a problem with the object re-use mode. >=20 > =96 Ufuk >=20 > On 20 Jan 2015, at 20:53, Max Michels wrote: >=20 >> Hi everyone, >>=20 >> I'm running into some problems implementing a Accumulator for >> returning a list of a DataSet. >>=20 >> https://github.com/mxm/flink/tree/count/collect >>=20 >> Basically, it works fine in this test case: >>=20 >>=20 >> ExecutionEnvironment env =3D = ExecutionEnvironment.getExecutionEnvironment(); >>=20 >> Integer[] input =3D {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; >>=20 >> DataSet data =3D env.fromElements(input); >>=20 >> // count >> long numEntries =3D data.count(); >> assertEquals(10, numEntries); >>=20 >> // collect >> ArrayList list =3D (ArrayList) data.collect(); >> assertArrayEquals(input, list.toArray()); >>=20 >>=20 >> But with non-primitive objects strange results occur: >>=20 >> ExecutionEnvironment env =3D = ExecutionEnvironment.getExecutionEnvironment(); >> env.getConfig().disableObjectReuse(); >>=20 >> DataSet data =3D env.fromElements(1, 2, 3, 4, 5, 6, 7, 8, 9, = 10); >> DataSet data2 =3D env.fromElements(1, 2, 3, 4, 5, 6, 7, 8, = 9, 10); >>=20 >> DataSet> data3 =3D data.cross(data2); >>=20 >> // count >> long numEntries =3D data3.count(); >> assertEquals(100, numEntries); >>=20 >> // collect >> ArrayList> list =3D = (ArrayList> Integer>>) data3.collect(); >>=20 >> System.out.println(list) >>=20 >> .... >>=20 >> Output: >>=20 >> [(10,10), (10,10), (10,10), (10,10), (10,10), (10,10), (10,10), >> (10,10), (10,10), (10,10), (10,10), (10,10), (10,10), (10,10), >> (10,10), (10,10), (10,10), (10,10), (10,10), (10,10), (10,6), (10,6), >> (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), >> (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), (10,6), >> (10,6), (10,6), (10,7), (10,7), (10,7), (10,7), (10,7), (10,7), >> (10,7), (10,7), (10,7), (10,7), (10,7), (10,7), (10,7), (10,7), >> (10,7), (10,7), (10,7), (10,7), (10,7), (10,7), (10,9), (10,9), >> (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), >> (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), (10,9), >> (10,9), (10,9), (10,8), (10,8), (10,8), (10,8), (10,8), (10,8), >> (10,8), (10,8), (10,8), (10,8), (10,8), (10,8), (10,8), (10,8), >> (10,8), (10,8), (10,8), (10,8), (10,8), (10,8)] >>=20 >> I assume, the problem is the clone() method of the ListAccumulator >> where we just create a shallow copy. This is fine for accumulators >> which use primitive objects, like IntCounter but here we have a real >> object. >>=20 >> How do we work around this problem? >>=20 >> Best regards, >> Max >=20