Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <555F9AF6.3010203@googlemail.com>
References: 
 <CAPVL+b7CxHMB0ea61DYg4jia-AF_tRk0QsQnHhC6gEP5W0SiOA@mail.gmail.com>
	<CANC1h_smsvb-KPK==ALHC14gcfY96+oCB=qCD0Jqt+EOZmJSFQ@mail.gmail.com>
	<CAC27z=O-5+orj00Liyao0esu2d5uZwFKpHAs23GTW-L5NPijUw@mail.gmail.com>
	<CAPVL+b43T5rNtChg1+C64Zc_SeE+jtXBzWPndFPsuJEnFpuqRA@mail.gmail.com>
	<CAPVL+b4=rQzDnR+Ww54Nb2OAoGyFiaSy4XLEbGfRzxo9_Q2uRg@mail.gmail.com>
	<CANC1h_t3FN7p4tDBN8X4WcZ03YH1AOYQkZ3hzny=TmYm63x14A@mail.gmail.com>
	<555F9AF6.3010203@googlemail.com>
Date: Fri, 22 May 2015 23:29:22 +0200
Message-ID: 
 <CAAdrtT1dNMTKoL0VBEmuFXMTS7jjwDMujDZTi2ihwEoOPgJACg@mail.gmail.com>
Subject: Re: k means - waiting for dataset
From: Fabian Hueske <fhueske@gmail.com>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a11c27cf08e0f7f0516b25a2a

--001a11c27cf08e0f7f0516b25a2a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

There are two ways to do that:

1) You use a GroupReduceFunction, which gives you an iterator over all
points similar to Hadoop's ReduceFunction.
2) You use the ReduceFunction to compute the sum and the count at the same
time (e.g., in two fields of a Tuple2) and use a MapFunction to do the
final division.

I'd go with the first choice. It's easier.

Best, Fabian

2015-05-22 23:09 GMT+02:00 Paul R=C3=B6wer <paul.roewer1990@googlemail.com>=
:

>  good evening,
>
> sorry, my english is not the best.
>
> by comupte the new centroid, i will sum all points of the cluster and for=
m
> the new center.
> in my other implementation firstly i sum all point and at the end i
> divides by number of points.
> to example: (1+2+3+4)/4=3D2,5
>
> at flink i reduce always two point to one,
> for the example upstairs: (1+2)/2=3D1,5 --> (1,5+3)/2=3D2,25 --> (2,25+4)=
=3D3,125
>
> how can i rewrite my function so, that it work like my other
> implementation?
>
> best regards,
> paul
>
>
>
> Am 22.05.2015 um 16:52 schrieb Stephan Ewen:
>
> Sorry, I don't understand the question.
>
>  Can you describe a bit better what you mean with "how i can sum all
> points and share thoug the counter" ?
>
>  Thanks!
>
> On Fri, May 22, 2015 at 2:06 PM, Pa R=C3=B6 <paul.roewer1990@googlemail.c=
om>
> wrote:
>
>>   i have fix a bug at the input reading, but the results are still
>> different.
>>
>> i think i have local the problem, in the other implementation i sum all
>> geo points/time points and share thougt the counter.
>>  but in flink i sum two points and share thougt two, and sum the next...
>>
>>  the method is the following:
>>
>> // sums and counts point coordinates
>>     private static final class CentroidAccumulator implements
>> ReduceFunction<Tuple2<Integer, GeoTimeDataTupel>> {
>>
>>         private static final long serialVersionUID =3D
>> -4868797820391121771L;
>>
>>         public Tuple2<Integer, GeoTimeDataTupel> reduce(Tuple2<Integer,
>> GeoTimeDataTupel> val1, Tuple2<Integer, GeoTimeDataTupel> val2) {
>>             return new Tuple2<Integer, GeoTimeDataTupel>(val1.f0,
>> addAndDiv(val1.f0,val1.f1,val2.f1));
>>         }
>>     }
>>
>>     private static GeoTimeDataTupel addAndDiv(int
>> clusterid,GeoTimeDataTupel input1, GeoTimeDataTupel input2){
>>         long time =3D (input1.getTime()+input2.getTime())/2;
>>         List<LatLongSeriable> list =3D new ArrayList<LatLongSeriable>();
>>         list.add(input1.getGeo());
>>         list.add(input2.getGeo());
>>         LatLongSeriable geo =3D Geometry.getGeoCenterOf(list);
>>
>>         return new GeoTimeDataTupel(geo,time,"POINT");
>>     }
>>
>>  how i can sum all points and share thoug the counter?
>>
>>
>> 2015-05-22 9:53 GMT+02:00 Pa R=C3=B6 <paul.roewer1990@googlemail.com>:
>>
>>>  hi,
>>>  if i print the centroids all are show in the output. i have implement =
k
>>> means with map reduce und spark. by same input, i get the same output. =
but
>>> in flink i get a one cluster output with this input set. (i use csv fil=
es
>>> from the GDELT projekt)
>>>
>>>  here my class:
>>>
>>> public class FlinkMain {
>>>
>>>
>>>     public static void main(String[] args) {
>>>         //load properties
>>>         Properties pro =3D new Properties();
>>>         try {
>>>             pro.load(new
>>> FileInputStream("./resources/config.properties"));
>>>         } catch (Exception e) {
>>>             e.printStackTrace();
>>>         }
>>>         int maxIteration =3D
>>> 1;//Integer.parseInt(pro.getProperty("maxiterations"));
>>>         String outputPath =3D pro.getProperty("flink.output");
>>>         // set up execution environment
>>>         ExecutionEnvironment env =3D
>>> ExecutionEnvironment.getExecutionEnvironment();
>>>         // get input points
>>>         DataSet<GeoTimeDataTupel> points =3D getPointDataSet(env);
>>>         DataSet<GeoTimeDataCenter> centroids =3D getCentroidDataSet(env=
);
>>>         // set number of bulk iterations for KMeans algorithm
>>>         IterativeDataSet<GeoTimeDataCenter> loop =3D
>>> centroids.iterate(maxIteration);
>>>         DataSet<GeoTimeDataCenter> newCentroids =3D points
>>>             // compute closest centroid for each point
>>>             .map(new SelectNearestCenter()).withBroadcastSet(loop,
>>> "centroids")
>>>             // count and sum point coordinates for each centroid
>>>             .groupBy(0).reduce(new CentroidAccumulator())
>>>             // compute new centroids from point counts and coordinate
>>> sums
>>>             .map(new CentroidAverager());
>>>         // feed new centroids back into next iteration
>>>         DataSet<GeoTimeDataCenter> finalCentroids =3D
>>> loop.closeWith(newCentroids);
>>>         DataSet<Tuple2<Integer, GeoTimeDataTupel>> clusteredPoints =3D
>>> points
>>>             // assign points to final clusters
>>>             .map(new
>>> SelectNearestCenter()).withBroadcastSet(finalCentroids, "centroids");
>>>         // emit result
>>>         clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
>>>         finalCentroids.writeAsText(outputPath+"/centers");//print();
>>>         // execute program
>>>         try {
>>>             env.execute("KMeans Flink");
>>>         } catch (Exception e) {
>>>             e.printStackTrace();
>>>         }
>>>     }
>>>
>>>      private static final class SelectNearestCenter extends
>>> RichMapFunction<GeoTimeDataTupel,Tuple2<Integer,GeoTimeDataTupel>> {
>>>
>>>         private static final long serialVersionUID =3D
>>> -2729445046389350264L;
>>>         private Collection<GeoTimeDataCenter> centroids;
>>>
>>>         @Override
>>>         public void open(Configuration parameters) throws Exception {
>>>             this.centroids =3D
>>> getRuntimeContext().getBroadcastVariable("centroids");
>>>         }
>>>
>>>         @Override
>>>         public Tuple2<Integer, GeoTimeDataTupel> map(GeoTimeDataTupel
>>> point) throws Exception {
>>>             double minDistance =3D Double.MAX_VALUE;
>>>             int closestCentroidId=3D -1;
>>>
>>>             // check all cluster centers
>>>             for(GeoTimeDataCenter centroid : centroids) {
>>>                 // compute distance
>>>                 double distance =3D Distance.ComputeDist(point, centroi=
d);
>>>                 // update nearest cluster if necessary
>>>                 if(distance < minDistance) {
>>>                     minDistance =3D distance;
>>>                     closestCentroidId =3D centroid.getId();
>>>                 }
>>>             }
>>>             // emit a new record with the center id and the data point
>>>             return new Tuple2<Integer,
>>> GeoTimeDataTupel>(closestCentroidId, point);
>>>         }
>>>     }
>>>
>>>     // sums and counts point coordinates
>>>     private static final class CentroidAccumulator implements
>>> ReduceFunction<Tuple2<Integer, GeoTimeDataTupel>> {
>>>
>>>         private static final long serialVersionUID =3D
>>> -4868797820391121771L;
>>>
>>>         public Tuple2<Integer, GeoTimeDataTupel> reduce(Tuple2<Integer,
>>> GeoTimeDataTupel> val1, Tuple2<Integer, GeoTimeDataTupel> val2) {
>>>             return new Tuple2<Integer, GeoTimeDataTupel>(val1.f0,
>>> addAndDiv(val1.f1,val2.f1));
>>>         }
>>>     }
>>>
>>>     private static GeoTimeDataTupel addAndDiv(GeoTimeDataTupel input1,
>>> GeoTimeDataTupel input2){
>>>         long time =3D (input1.getTime()+input2.getTime())/2;
>>>         List<LatLongSeriable> list =3D new ArrayList<LatLongSeriable>()=
;
>>>         list.add(input1.getGeo());
>>>         list.add(input2.getGeo());
>>>         LatLongSeriable geo =3D Geometry.getGeoCenterOf(list);
>>>
>>>         return new GeoTimeDataTupel(geo,time,"POINT");
>>>     }
>>>
>>>     // computes new centroid from coordinate sum and count of points
>>>     private static final class CentroidAverager implements
>>> MapFunction<Tuple2<Integer, GeoTimeDataTupel>, GeoTimeDataCenter> {
>>>
>>>         private static final long serialVersionUID =3D
>>> -2687234478847261803L;
>>>
>>>         public GeoTimeDataCenter map(Tuple2<Integer, GeoTimeDataTupel>
>>> value) {
>>>             return new GeoTimeDataCenter(value.f0,
>>> value.f1.getGeo(),value.f1.getTime());
>>>         }
>>>     }
>>>
>>>     private static DataSet<GeoTimeDataTupel>
>>> getPointDataSet(ExecutionEnvironment env) {
>>>         // load properties
>>>         Properties pro =3D new Properties();
>>>         try {
>>>             pro.load(new
>>> FileInputStream("./resources/config.properties"));
>>>         } catch (Exception e) {
>>>             e.printStackTrace();
>>>         }
>>>         String inputFile =3D pro.getProperty("input");
>>>         // map csv file
>>>         return env.readCsvFile(inputFile)
>>>             .ignoreInvalidLines()
>>>             .fieldDelimiter('\u0009')
>>>             //.fieldDelimiter("\t")
>>>             //.lineDelimiter("\n")
>>>             .includeFields(true, true, false, false, false, false,
>>> false, false, false, false, false
>>>                     , false, false, false, false, false, false, false,
>>> false, false, false
>>>                     , false, false, false, false, false, false, false,
>>> false, false, false
>>>                     , false, false, false, false, false, false, false,
>>> false, true, true
>>>                     , false, false, false, false, false, false, false,
>>> false, false, false
>>>                     , false, false, false, false, false, false, false,
>>> false)
>>>             //.includeFields(true,true,true,true)
>>>             .types(String.class, Long.class, Double.class, Double.class=
)
>>>             .map(new TuplePointConverter());
>>>     }
>>>
>>>     private static final class TuplePointConverter implements
>>> MapFunction<Tuple4<String, Long, Double, Double>, GeoTimeDataTupel>{
>>>
>>>         private static final long serialVersionUID =3D
>>> 3485560278562719538L;
>>>
>>>         public GeoTimeDataTupel map(Tuple4<String, Long, Double, Double=
>
>>> t) throws Exception {
>>>             return new GeoTimeDataTupel(new LatLongSeriable(t.f2, t.f3)=
,
>>> t.f1, t.f0);
>>>         }
>>>     }
>>>
>>>     private static DataSet<GeoTimeDataCenter>
>>> getCentroidDataSet(ExecutionEnvironment env) {
>>>         // load properties
>>>         Properties pro =3D new Properties();
>>>         try {
>>>             pro.load(new
>>> FileInputStream("./resources/config.properties"));
>>>         } catch (Exception e) {
>>>             e.printStackTrace();
>>>         }
>>>         String seedFile =3D pro.getProperty("seed.file");
>>>         boolean seedFlag =3D
>>> Boolean.parseBoolean(pro.getProperty("seed.flag"));
>>>         // get points from file or random
>>>         if(seedFlag || !(new File(seedFile+"-1").exists())) {
>>>             Seeding.randomSeeding();
>>>         }
>>>         // map csv file
>>>         return env.readCsvFile(seedFile+"-1")
>>>             .lineDelimiter("\n")
>>>             .fieldDelimiter('\u0009')
>>>             //.fieldDelimiter("\t")
>>>             .includeFields(true, true, true, true)
>>>             .types(Integer.class, Double.class, Double.class, Long.clas=
s)
>>>             .map(new TupleCentroidConverter());
>>>     }
>>>
>>>     private static final class TupleCentroidConverter implements
>>> MapFunction<Tuple4<Integer, Double, Double, Long>, GeoTimeDataCenter>{
>>>
>>>         private static final long serialVersionUID =3D
>>> -1046538744363026794L;
>>>
>>>         public GeoTimeDataCenter map(Tuple4<Integer, Double, Double,
>>> Long> t) throws Exception {
>>>             return new GeoTimeDataCenter(t.f0,new LatLongSeriable(t.f1,
>>> t.f2), t.f3);
>>>         }
>>>     }
>>> }
>>>
>>> 2015-05-21 14:22 GMT+02:00 Till Rohrmann <trohrmann@apache.org>:
>>>
>>>> Concerning your first problem that you only see one resulting centroid=
,
>>>> your code looks good modulo the parts you haven't posted.
>>>>
>>>>  However, your problem could simply be caused by a bad selection of
>>>> initial centroids. If, for example, all centroids except for one don't=
 get
>>>> any points assigned, then only one centroid will survive the iteration
>>>> step. How do you do it?
>>>>
>>>>  To check that all centroids are read you can print the contents of
>>>> the centroids DataSet. Furthermore, you can simply println the new
>>>> centroids after each iteration step. In local mode you can then observ=
e the
>>>> computation.
>>>>
>>>>  Cheers,
>>>> Till
>>>>
>>>> On Thu, May 21, 2015 at 12:23 PM, Stephan Ewen <sewen@apache.org>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>>  This problem should not depend on any user code. There are no
>>>>> user-code dependent actors in Flink.
>>>>>
>>>>>  Is there more stack trace that you can send us? It looks like it
>>>>> misses the core exception that is causing the issue is not part of th=
e
>>>>> stack trace.
>>>>>
>>>>>  Greetings,
>>>>> Stephan
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 21, 2015 at 11:11 AM, Pa R=C3=B6 <
>>>>> paul.roewer1990@googlemail.com> wrote:
>>>>>
>>>>>>    hi flink community,
>>>>>>
>>>>>>  i have implement k-means for clustering temporal geo data. i use th=
e
>>>>>> following github project and my own data structure:
>>>>>>
>>>>>> https://github.com/apache/flink/blob/master/flink-examples/flink-jav=
a-examples/src/main/java/org/apache/flink/examples/java/clustering/KMeans.j=
ava
>>>>>>
>>>>>>  not i have the problem, that flink read the centroids from file and
>>>>>> work parallel futher. if i look at the results, i have the feeling, =
that
>>>>>> the prgramm load only one centroid point.
>>>>>>
>>>>>>  i work with flink 0.8.1, if i update to 0.9 milestone 1 i get the
>>>>>> following exception:
>>>>>> ERROR actor.OneForOneStrategy: exception during creation
>>>>>> akka.actor.ActorInitializationException: exception during creation
>>>>>>     at akka.actor.ActorInitializationException$.apply(Actor.scala:21=
8)
>>>>>>     at akka.actor.ActorCell.create(ActorCell.scala:578)
>>>>>>     at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)
>>>>>>     at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
>>>>>>     at
>>>>>> akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
>>>>>>     at akka.dispatch.Mailbox.run(Mailbox.scala:218)
>>>>>>     at
>>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abs=
tractDispatcher.scala:386)
>>>>>>     at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>     at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPoo=
l.java:1339)
>>>>>>     at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1=
979)
>>>>>>     at
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThr=
ead.java:107)
>>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>>> Method)
>>>>>>     at
>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruc=
torAccessorImpl.java:57)
>>>>>>     at
>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delegating=
ConstructorAccessorImpl.java:45)
>>>>>>     at java.lang.reflect.Constructor.newInstance(Constructor.java:52=
6)
>>>>>>     at akka.util.Reflect$.instantiate(Reflect.scala:65)
>>>>>>     at akka.actor.Props.newActor(Props.scala:337)
>>>>>>     at akka.actor.ActorCell.newActor(ActorCell.scala:534)
>>>>>>     at akka.actor.ActorCell.create(ActorCell.scala:560)
>>>>>>     ... 9 more
>>>>>>
>>>>>>  how can i say flink, that it should be wait for loading dataset, an=
d
>>>>>> what say this exception?
>>>>>>
>>>>>>  best regards,
>>>>>>  paul
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>

--001a11c27cf08e0f7f0516b25a2a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>There are two ways to do that:<br><br>1) Yo=
u use a GroupReduceFunction, which gives you an iterator over all points si=
milar to Hadoop&#39;s ReduceFunction.<br></div>2) You use the ReduceFunctio=
n to compute the sum and the count at the same time (e.g., in two fields of=
 a Tuple2) and use a MapFunction to do the final division.<br><br></div>I&#=
39;d go with the first choice. It&#39;s easier.<br><br></div>Best, Fabian<b=
r></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2015-05-2=
2 23:09 GMT+02:00 Paul R=C3=B6wer <span dir=3D"ltr">&lt;<a href=3D"mailto:p=
aul.roewer1990@googlemail.com" target=3D"_blank">paul.roewer1990@googlemail=
.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    good evening,<br>
    <br>
    sorry, my english is not the best.<br>
    <br>
    by comupte the new centroid, i will sum all points of the cluster
    and form the new center.<br>
    in my other implementation firstly i sum all point and at the end i
    divides by number of points.<br>
    to example: (1+2+3+4)/4=3D2,5<br>
    <br>
    at flink i reduce always two point to one,<br>
    for the example upstairs: (1+2)/2=3D1,5 --&gt; (1,5+3)/2=3D2,25 --&gt;
    (2,25+4)=3D3,125<br>
    <br>
    how can i rewrite my function so, that it work like my other
    implementation?<br>
    <br>
    best regards,<br>
    paul<div><div class=3D"h5"><br>
    <br>
    <br>
    <div>Am 22.05.2015 um 16:52 schrieb Stephan
      Ewen:<br>
    </div>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">Sorry, I don&#39;t understand the question.
        <div><br>
        </div>
        <div>Can you describe a bit better what you mean with &quot;<span s=
tyle=3D"font-size:12.8000001907349px">how i can sum all
            points and share thoug the counter&quot; ?</span></div>
        <div><span style=3D"font-size:12.8000001907349px"><br>
          </span></div>
        <div><span style=3D"font-size:12.8000001907349px">Thanks!</span></d=
iv>
      </div>
      <div class=3D"gmail_extra"><br>
        <div class=3D"gmail_quote">On Fri, May 22, 2015 at 2:06 PM, Pa R=C3=
=B6
          <span dir=3D"ltr">&lt;<a href=3D"mailto:paul.roewer1990@googlemai=
l.com" target=3D"_blank">paul.roewer1990@googlemail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex">
            <div dir=3D"ltr">
              <div>
                <div>
                  <div>i have fix a bug at the input reading, but the
                    results are still different.<br>
                    <br>
                    i think i have local the problem, in the other
                    implementation i sum all geo points/time points and
                    share thougt the counter.<br>
                  </div>
                  but in flink i sum two points and share thougt two,
                  and sum the next...<br>
                  <br>
                </div>
                the method is the following:<span><br>
                  <br>
                  // sums and counts point coordinates<br>
                  =C2=A0=C2=A0=C2=A0 private static final class CentroidAcc=
umulator
                  implements ReduceFunction&lt;Tuple2&lt;Integer,
                  GeoTimeDataTupel&gt;&gt; {<br>
                  <br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private static fina=
l long serialVersionUID =3D
                  -4868797820391121771L;<br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public Tuple2&lt;In=
teger, GeoTimeDataTupel&gt;
                  reduce(Tuple2&lt;Integer, GeoTimeDataTupel&gt; val1,
                  Tuple2&lt;Integer, GeoTimeDataTupel&gt; val2) {<br>
                </span>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new Tuple2&lt;Integer,
                GeoTimeDataTupel&gt;(val1.f0,
                addAndDiv(val1.f0,val1.f1,val2.f1));<br>
                =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                =C2=A0=C2=A0=C2=A0 }<br>
                =C2=A0=C2=A0=C2=A0 <br>
                =C2=A0=C2=A0=C2=A0 private static GeoTimeDataTupel addAndDi=
v(int
                clusterid,GeoTimeDataTupel input1, GeoTimeDataTupel
                input2){<span><br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 long time =3D
                  (input1.getTime()+input2.getTime())/2;<br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 List&lt;LatLongSeri=
able&gt; list =3D new
                  ArrayList&lt;LatLongSeriable&gt;();<br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 list.add(input1.get=
Geo());<br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 list.add(input2.get=
Geo());<br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 LatLongSeriable geo=
 =3D
                  Geometry.getGeoCenterOf(list);<br>
                  <br>
                  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return new GeoTimeD=
ataTupel(geo,time,&quot;POINT&quot;);<br>
                  =C2=A0=C2=A0=C2=A0 }<br>
                  <br>
                </span></div>
              how i can sum all points and share thoug the counter?<br>
              <div>
                <div>
                  <div><br>
                  </div>
                </div>
              </div>
            </div>
            <div>
              <div>
                <div class=3D"gmail_extra"><br>
                  <div class=3D"gmail_quote">2015-05-22 9:53 GMT+02:00 Pa
                    R=C3=B6 <span dir=3D"ltr">&lt;<a href=3D"mailto:paul.ro=
ewer1990@googlemail.com" target=3D"_blank">paul.roewer1990@googlemail.com</=
a>&gt;</span>:<br>
                    <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div dir=3D"ltr">
                        <div>
                          <div>hi, <br>
                          </div>
                          if i print the centroids all are show in the
                          output. i have implement k means with map
                          reduce und spark. by same input, i get the
                          same output. but in flink i get a one cluster
                          output with this input set. (i use csv files
                          from the GDELT projekt)<br>
                          <br>
                        </div>
                        here my class:<br>
                        <br>
                        public class FlinkMain {
                        <div>
                          <div><br>
                            =C2=A0=C2=A0=C2=A0 <br>
                            =C2=A0=C2=A0=C2=A0 public static void main(Stri=
ng[] args) {<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 //load pr=
operties<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Propertie=
s pro =3D new Properties();<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 try {<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 pro.load(new
                            FileInputStream(&quot;./resources/config.proper=
ties&quot;));<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (=
Exception e) {<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 e.printStackTrace();<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 int maxIt=
eration =3D
                            1;//Integer.parseInt(pro.getProperty(&quot;maxi=
terations&quot;));<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 String ou=
tputPath =3D
                            pro.getProperty(&quot;flink.output&quot;);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // set up=
 execution environment<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Execution=
Environment env =3D
                            ExecutionEnvironment.getExecutionEnvironment();=
<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // get in=
put points <br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 DataSet&l=
t;GeoTimeDataTupel&gt;
                            points =3D getPointDataSet(env);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 DataSet&l=
t;GeoTimeDataCenter&gt;
                            centroids =3D getCentroidDataSet(env);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // set nu=
mber of bulk iterations for
                            KMeans algorithm<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0
                            IterativeDataSet&lt;GeoTimeDataCenter&gt;
                            loop =3D centroids.iterate(maxIteration);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 DataSet&l=
t;GeoTimeDataCenter&gt;
                            newCentroids =3D points<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 // compute closest centroid for
                            each point<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 .map(new
                            SelectNearestCenter()).withBroadcastSet(loop,
                            &quot;centroids&quot;)<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 // count and sum point
                            coordinates for each centroid<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 .groupBy(0).reduce(new
                            CentroidAccumulator())<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 // compute new centroids from
                            point counts and coordinate sums<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 .map(new CentroidAverager());<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // feed n=
ew centroids back into next
                            iteration<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 DataSet&l=
t;GeoTimeDataCenter&gt;
                            finalCentroids =3D
                            loop.closeWith(newCentroids);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 DataSet&l=
t;Tuple2&lt;Integer,
                            GeoTimeDataTupel&gt;&gt; clusteredPoints =3D
                            points<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 // assign points to final
                            clusters<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 .map(new
                            SelectNearestCenter()).withBroadcastSet(finalCe=
ntroids,
                            &quot;centroids&quot;);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // emit r=
esult<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0
                            clusteredPoints.writeAsCsv(outputPath+&quot;/po=
ints&quot;,
                            &quot;\n&quot;, &quot; &quot;);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0
                            finalCentroids.writeAsText(outputPath+&quot;/ce=
nters&quot;);//print();<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // execut=
e program<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 try {<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 env.execute(&quot;KMeans Flink&quot;);<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (=
Exception e) {<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 e.printStackTrace();<br>
                            =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                            =C2=A0=C2=A0=C2=A0 }<br>
                            =C2=A0=C2=A0=C2=A0 <br>
                          </div>
                        </div>
                        =C2=A0=C2=A0=C2=A0 private static final class
                        SelectNearestCenter extends
                        RichMapFunction&lt;GeoTimeDataTupel,Tuple2&lt;Integ=
er,GeoTimeDataTupel&gt;&gt;
                        {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private stati=
c final long
                        serialVersionUID =3D -2729445046389350264L;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private
                        Collection&lt;GeoTimeDataCenter&gt; centroids;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 @Override<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public void o=
pen(Configuration
                        parameters) throws Exception {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 this.centroids =3D
                        getRuntimeContext().getBroadcastVariable(&quot;cent=
roids&quot;);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 @Override<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public Tuple2=
&lt;Integer,
                        GeoTimeDataTupel&gt; map(GeoTimeDataTupel point)
                        throws Exception {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 double minDistance =3D
                        Double.MAX_VALUE;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 int closestCentroidId=3D -1;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 // check all cluster centers<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 for(GeoTimeDataCenter centroid :
                        centroids) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 // compute distance<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 double distance =3D
                        Distance.ComputeDist(point, centroid);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 // update nearest cluster if
                        necessary<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 if(distance &lt; minDistance) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 minDistance =3D distance;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 closestCentroidId =3D
                        centroid.getId();<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 // emit a new record with the center
                        id and the data point<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt;(closestCentroidId, point);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 // sums and counts point coordin=
ates<br>
                        =C2=A0=C2=A0=C2=A0 private static final class
                        CentroidAccumulator implements
                        ReduceFunction&lt;Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt;&gt; {<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private stati=
c final long
                        serialVersionUID =3D -4868797820391121771L;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public Tuple2=
&lt;Integer,
                        GeoTimeDataTupel&gt; reduce(Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt; val1, Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt; val2) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt;(val1.f0,
                        addAndDiv(val1.f1,val2.f1));<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 private static GeoTimeDataTupel
                        addAndDiv(GeoTimeDataTupel input1,
                        GeoTimeDataTupel input2){<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 long time =3D
                        (input1.getTime()+input2.getTime())/2;<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 List&lt;LatLo=
ngSeriable&gt; list =3D new
                        ArrayList&lt;LatLongSeriable&gt;();<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 list.add(inpu=
t1.getGeo());<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 list.add(inpu=
t2.getGeo());<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 LatLongSeriab=
le geo =3D
                        Geometry.getGeoCenterOf(list);<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return new
                        GeoTimeDataTupel(geo,time,&quot;POINT&quot;);<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 // computes new centroid from co=
ordinate sum
                        and count of points<br>
                        =C2=A0=C2=A0=C2=A0 private static final class Centr=
oidAverager
                        implements MapFunction&lt;Tuple2&lt;Integer,
                        GeoTimeDataTupel&gt;, GeoTimeDataCenter&gt; {<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private stati=
c final long
                        serialVersionUID =3D -2687234478847261803L;<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public GeoTim=
eDataCenter
                        map(Tuple2&lt;Integer, GeoTimeDataTupel&gt;
                        value) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new
                        GeoTimeDataCenter(value.f0,
                        value.f1.getGeo(),value.f1.getTime());<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 private static
                        DataSet&lt;GeoTimeDataTupel&gt;
                        getPointDataSet(ExecutionEnvironment env) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // load prope=
rties<span><br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Properties =
pro =3D new Properties();<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 try {<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 pro.load(new
                          FileInputStream(&quot;./resources/config.properti=
es&quot;));<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (Ex=
ception e) {<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 e.printStackTrace();<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        </span>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 String=
 inputFile =3D
                        pro.getProperty(&quot;input&quot;);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // map csv fi=
le<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return env.re=
adCsvFile(inputFile)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .ignoreInvalidLines()<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .fieldDelimiter(&#39;\u0009&#39;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 //.fieldDelimiter(&quot;\t&quot;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 //.lineDelimiter(&quot;\n&quot;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .includeFields(true, true, false,
                        false, false, false, false, false, false, false,
                        false<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 , false, false, false,
                        false, false, false, false, false, false, false<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 , false, false, false,
                        false, false, false, false, false, false, false<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 , false, false, false,
                        false, false, false, false, false, true, true<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 , false, false, false,
                        false, false, false, false, false, false, false<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 , false, false, false,
                        false, false, false, false, false)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0
                        //.includeFields(true,true,true,true)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .types(String.class, Long.class,
                        Double.class, Double.class)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .map(new TuplePointConverter());<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 private static final class
                        TuplePointConverter implements
                        MapFunction&lt;Tuple4&lt;String, Long, Double,
                        Double&gt;, GeoTimeDataTupel&gt;{<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private stati=
c final long
                        serialVersionUID =3D 3485560278562719538L;<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public GeoTim=
eDataTupel
                        map(Tuple4&lt;String, Long, Double, Double&gt;
                        t) throws Exception {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new GeoTimeDataTupel(new
                        LatLongSeriable(t.f2, t.f3), t.f1, t.f0);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 private static
                        DataSet&lt;GeoTimeDataCenter&gt;
                        getCentroidDataSet(ExecutionEnvironment env) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // load prope=
rties<span><br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Properties =
pro =3D new Properties();<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 try {<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 pro.load(new
                          FileInputStream(&quot;./resources/config.properti=
es&quot;));<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (Ex=
ception e) {<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 e.printStackTrace();<br>
                          =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        </span>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 String=
 seedFile =3D
                        pro.getProperty(&quot;seed.file&quot;);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 boolean seedF=
lag =3D
                        Boolean.parseBoolean(pro.getProperty(&quot;seed.fla=
g&quot;));<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // get points=
 from file or random<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if(seedFlag |=
| !(new
                        File(seedFile+&quot;-1&quot;).exists())) {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 Seeding.randomSeeding();<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // map csv fi=
le<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return env.re=
adCsvFile(seedFile+&quot;-1&quot;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .lineDelimiter(&quot;\n&quot;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .fieldDelimiter(&#39;\u0009&#39;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 //.fieldDelimiter(&quot;\t&quot;)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .includeFields(true, true, true,
                        true)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .types(Integer.class, Double.class,
                        Double.class, Long.class)<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 .map(new TupleCentroidConverter());<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 <br>
                        =C2=A0=C2=A0=C2=A0 private static final class
                        TupleCentroidConverter implements
                        MapFunction&lt;Tuple4&lt;Integer, Double,
                        Double, Long&gt;, GeoTimeDataCenter&gt;{<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 private stati=
c final long
                        serialVersionUID =3D -1046538744363026794L;<br>
                        <br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 public GeoTim=
eDataCenter
                        map(Tuple4&lt;Integer, Double, Double, Long&gt;
                        t) throws Exception {<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 return new
                        GeoTimeDataCenter(t.f0,new LatLongSeriable(t.f1,
                        t.f2), t.f3);<br>
                        =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>
                        =C2=A0=C2=A0=C2=A0 }<br>
                        }<br>
                      </div>
                      <div>
                        <div>
                          <div class=3D"gmail_extra"><br>
                            <div class=3D"gmail_quote">2015-05-21 14:22
                              GMT+02:00 Till Rohrmann <span dir=3D"ltr">&lt=
;<a href=3D"mailto:trohrmann@apache.org" target=3D"_blank">trohrmann@apache=
.org</a>&gt;</span>:<br>
                              <blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                <div dir=3D"ltr">Concerning your first
                                  problem that you only see one
                                  resulting centroid, your code looks
                                  good modulo the parts you haven&#39;t
                                  posted.=C2=A0
                                  <div><br>
                                  </div>
                                  <div>However, your problem could
                                    simply be caused by a bad selection
                                    of initial centroids. If, for
                                    example, all centroids except for
                                    one don&#39;t get any points assigned,
                                    then only one centroid will survive
                                    the iteration step. How do you do
                                    it?=C2=A0</div>
                                  <div><br>
                                  </div>
                                  <div>To check that all centroids are
                                    read you can print the contents of
                                    the centroids DataSet. Furthermore,
                                    you can simply println the new
                                    centroids after each iteration step.
                                    In local mode you can then observe
                                    the computation.</div>
                                  <div><br>
                                  </div>
                                  <div>Cheers,</div>
                                  <div>Till</div>
                                </div>
                                <div>
                                  <div>
                                    <div class=3D"gmail_extra"><br>
                                      <div class=3D"gmail_quote">On Thu,
                                        May 21, 2015 at 12:23 PM,
                                        Stephan Ewen <span dir=3D"ltr">&lt;=
<a href=3D"mailto:sewen@apache.org" target=3D"_blank">sewen@apache.org</a>&=
gt;</span>
                                        wrote:<br>
                                        <blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                          <div dir=3D"ltr">Hi!
                                            <div><br>
                                            </div>
                                            <div>This problem should not
                                              depend on any user code.
                                              There are no user-code
                                              dependent actors in Flink.</d=
iv>
                                            <div><br>
                                            </div>
                                            <div>Is there more stack
                                              trace that you can send
                                              us? It looks like it
                                              misses the core exception
                                              that is causing the issue
                                              is not part of the stack
                                              trace.</div>
                                            <div><br>
                                            </div>
                                            <div>Greetings,</div>
                                            <div>Stephan</div>
                                            <div><br>
                                            </div>
                                            <div><br>
                                            </div>
                                          </div>
                                          <div class=3D"gmail_extra"><br>
                                            <div class=3D"gmail_quote"><spa=
n>On
                                                Thu, May 21, 2015 at
                                                11:11 AM, Pa R=C3=B6 <span =
dir=3D"ltr">&lt;<a href=3D"mailto:paul.roewer1990@googlemail.com" target=3D=
"_blank">paul.roewer1990@googlemail.com</a>&gt;</span>
                                                wrote:<br>
                                              </span>
                                              <div>
                                                <div>
                                                  <blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex">
                                                    <div dir=3D"ltr">
                                                      <div>
                                                        <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>hi flink
                                                          community,<br>
                                                          <br>
                                                          </div>
                                                          i have
                                                          implement
                                                          k-means for
                                                          clustering
                                                          temporal geo
                                                          data. i use
                                                          the following
                                                          github project
                                                          and my own
                                                          data
                                                          structure:<br>
                                                          <a href=3D"https:=
//github.com/apache/flink/blob/master/flink-examples/flink-java-examples/sr=
c/main/java/org/apache/flink/examples/java/clustering/KMeans.java" target=
=3D"_blank">https://github.com/apache/flink/blob/master/flink-examples/flin=
k-java-examples/src/main/java/org/apache/flink/examples/java/clustering/KMe=
ans.java</a><br>
                                                          <br>
                                                          </div>
                                                          not i have the
                                                          problem, that
                                                          flink read the
                                                          centroids from
                                                          file and work
                                                          parallel
                                                          futher. if i
                                                          look at the
                                                          results, i
                                                          have the
                                                          feeling, that
                                                          the prgramm
                                                          load only one
                                                          centroid
                                                          point.<br>
                                                          <br>
                                                          </div>
                                                          i work with
                                                          flink 0.8.1,
                                                          if i update to
                                                          0.9 milestone
                                                          1 i get the
                                                          following
                                                          exception:<br>
                                                          ERROR
                                                          actor.OneForOneSt=
rategy:
                                                          exception
                                                          during
                                                          creation<br>
                                                          akka.actor.ActorI=
nitializationException:
                                                          exception
                                                          during
                                                          creation<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorI=
nitializationException$.apply(Actor.scala:218)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorC=
ell.create(ActorCell.scala:578)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorC=
ell.invokeAll$1(ActorCell.scala:425)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorC=
ell.systemInvoke(ActorCell.scala:447)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.dispatch.Mai=
lbox.processAllSystemMessages(Mailbox.scala:262)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.dispatch.Mai=
lbox.run(Mailbox.scala:218)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDi=
spatcher.scala:386)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          scala.concurrent.=
forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:=
1339)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          scala.concurrent.=
forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.jav=
a:107)<br>
                                                          Caused by:
                                                          java.lang.reflect=
.InvocationTargetException<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          sun.reflect.Nativ=
eConstructorAccessorImpl.newInstance0(Native
                                                          Method)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcce=
ssorImpl.java:57)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru=
ctorAccessorImpl.java:45)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          java.lang.reflect=
.Constructor.newInstance(Constructor.java:526)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.util.Reflect=
$.instantiate(Reflect.scala:65)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.Props.=
newActor(Props.scala:337)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorC=
ell.newActor(ActorCell.scala:534)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 at
                                                          akka.actor.ActorC=
ell.create(ActorCell.scala:560)<br>
                                                          =C2=A0=C2=A0=C2=
=A0 ... 9 more<br>
                                                          <br>
                                                          </div>
                                                          how can i say
                                                          flink, that it
                                                          should be wait
                                                          for loading
                                                          dataset, and
                                                          what say this
                                                          exception?<br>
                                                          <br>
                                                        </div>
                                                        best regards,<br>
                                                      </div>
                                                      paul<br>
                                                    </div>
                                                  </blockquote>
                                                </div>
                                              </div>
                                            </div>
                                            <br>
                                          </div>
                                        </blockquote>
                                      </div>
                                      <br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>

--001a11c27cf08e0f7f0516b25a2a--