Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <6abda2d9-82bc-aa27-399c-77475446c7fd@campus.tu-berlin.de>
References: <6abda2d9-82bc-aa27-399c-77475446c7fd@campus.tu-berlin.de>
From: Greg Hogan <code@greghogan.com>
Date: Fri, 2 Sep 2016 10:39:28 -0400
Message-ID: <CA+3U0Z=GUN=7mx5FHZFrX92f1c5_Tq6upbj9Gec6KOPsNBKmQw@mail.gmail.com>
Subject: Re: Flink Iterations vs. While loop
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a114968e63d8029053b874c9d
archived-at: Fri, 02 Sep 2016 14:39:37 -0000

--001a114968e63d8029053b874c9d
Content-Type: text/plain; charset=UTF-8

Hi Dan,

Where are you reading the 200 GB "data" from? How much memory per node? If
the DataSet is read from a distributed filesystem and if with iterations
Flink must spill to disk then I wouldn't expect much difference. About how
many iterations are run in the 30 minutes? I don't know that this is
reported explicitly, but if your convergence function only has one input
record per iteration then the reported total is the iteration count.

One other thought, we should soon have support for object reuse with arrays
(FLINK-3695). This would be implemented as DoubleValueArray or
ValueArray<DoubleValue> rather than double[] but it would be interesting to
test for a change in performance.

Greg

On Fri, Sep 2, 2016 at 6:16 AM, Dan Drewes <drewes@campus.tu-berlin.de>
wrote:

> Hi,
>
> for my bachelor thesis I'm testing an implementation of L-BFGS algorithm
> with Flink Iterations against a version without Flink Iterations but a
> casual while loop instead. Both programs use the same Map and Reduce
> transformations in each iteration. It was expected, that the performance of
> the Flink Iterations would scale better with increasing size of the input
> data set. However, the measured results on an ibm-power-cluster are very
> similar for both versions, e.g. around 30 minutes for 200 GB data. The
> cluster has 8 nodes, was configured with 4 slots per node and I used a
> total parallelism of 32.
> In every Iteration of the while loop a new flink job is started and I
> thought, that also the data would be distributed over the network again in
> each iteration which should consume a significant and measurable amount of
> time. Is that thought wrong or what is the computional overhead of the
> flink iterations that is equalizing this disadvantage?
> I include the relevant part of both programs and also attach the generated
> execution plans.
> Thank you for any ideas as I could not find much about this issue in the
> flink docs.
>
> Best, Dan
>
> *Flink Iterations:*
>
> DataSet<double[]> data = ...
>
> State state = initialState(m, initweights,0,new double[initweights.length]);
> DataSet<State> statedataset = env.fromElements(state);
> //start of iteration sectionIterativeDataSet<State> loop= statedataset.iterate(niter);;
>
>
> DataSet<State> statewithnewlossgradient = data.map(difffunction).withBroadcastSet(loop, "state")
>               .reduce(accumulate)
>               .map(new NormLossGradient(datasize))
>               .map(new SetLossGradient()).withBroadcastSet(loop,"state")
>               .map(new LBFGS());
>
>
> DataSet<State> converged = statewithnewlossgradient.filter(
>    new FilterFunction<State>() {
>       @Override      public boolean filter(State value) throws Exception {
>          if(value.getIflag()[0] == 0){
>             return false;
>          }
>          return true;
>       }
>    }
> );
>
> DataSet<State> finalstate = loop.closeWith(statewithnewlossgradient,converged);
>
>
>
>
> *While loop: *
>
> DataSet<double[]> data =...
> State state = initialState(m, initweights,0,new double[initweights.length]);
> int cnt=0;do{
>    LBFGS lbfgs = new LBFGS();
>    statedataset=data.map(difffunction).withBroadcastSet(statedataset, "state")
>       .reduce(accumulate)
>       .map(new NormLossGradient(datasize))
>       .map(new SetLossGradient()).withBroadcastSet(statedataset,"state")
>       .map(lbfgs);
>    cnt++;
> }while (cnt<niter && statedataset.collect().get(0).getIflag()[0] != 0);
>
>
>

--001a114968e63d8029053b874c9d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi Dan,<br><br></div><div>Where are you reading the 2=
00 GB &quot;data&quot; from? How much memory per node? If the DataSet is re=
ad from a distributed filesystem and if with iterations Flink must spill to=
 disk then I wouldn&#39;t expect much difference. About how many iterations=
 are run in the 30 minutes? I don&#39;t know that this is reported explicit=
ly, but if your convergence function only has one input record per iteratio=
n then the reported total is the iteration count.<br><br></div><div>One oth=
er thought, we should soon have support for object reuse with arrays (FLINK=
-3695). This would be implemented as DoubleValueArray or ValueArray&lt;Doub=
leValue&gt; rather than double[] but it would be interesting to test for a =
change in performance.<br></div><div><br></div>Greg<br></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Fri, Sep 2, 2016 at 6:16 AM,=
 Dan Drewes <span dir=3D"ltr">&lt;<a href=3D"mailto:drewes@campus.tu-berlin=
.de" target=3D"_blank">drewes@campus.tu-berlin.de</a>&gt;</span> wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
 =20

   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    Hi,<br>
    <br>
    for my bachelor thesis I&#39;m testing an implementation of L-BFGS
    algorithm with Flink Iterations against a version without Flink
    Iterations but a casual while loop instead. Both programs use the
    same Map and Reduce transformations in each iteration. It was
    expected, that the performance of the Flink Iterations would scale
    better with increasing size of the input data set. However, the
    measured results on an ibm-power-cluster are very similar for both
    versions, e.g. around 30 minutes for 200 GB data. The cluster has 8
    nodes, was configured with 4 slots per node and I used a total
    parallelism of 32.<br>
    In every Iteration of the while loop a new flink job is started and
    I thought, that also the data would be distributed over the network
    again in each iteration which should consume a significant and
    measurable amount of time. Is that thought wrong or what is the
    computional overhead of the flink iterations that is equalizing this
    disadvantage?<br>
    I include the relevant part of both programs and also attach the
    generated execution plans.<br>
    Thank you for any ideas as I could not find much about this issue in
    the flink docs.<br>
    <br>
    Best, Dan<br>
    <br>
    <b>Flink Iterations:</b><br>
    <br>
   =20
    <pre style=3D"background-color:#ffffff;color:#000000;font-family:&#39;C=
ourier New&#39;;font-size:9,0pt"><span style=3D"background-color:#e4e4ff">D=
ataSet</span>&lt;<span style=3D"color:#000080;font-weight:bold">double</spa=
n>[]&gt; data =3D ...
</pre>
    <pre style=3D"background-color:#ffffff;color:#000000;font-family:&#39;C=
ourier New&#39;;font-size:9,0pt"><span style=3D"background-color:#e4e4ff">S=
tate</span> state =3D <span style=3D"font-style:italic">initialState</span>=
(m, initweights,<span style=3D"color:#0000ff">0</span>,<span style=3D"color=
:#000080;font-weight:bold">new double</span>[initweights.<span style=3D"col=
or:#660e7a;font-weight:bold">length</span>]);
DataSet&lt;<span style=3D"background-color:#e4e4ff">State</span>&gt; stated=
ataset =3D env.fromElements(state);

<span style=3D"color:#808080;font-style:italic">//start of iteration sectio=
n
</span><span style=3D"color:#808080;font-style:italic">
</span>IterativeDataSet&lt;<span style=3D"background-color:#e4e4ff">State</=
span>&gt; loop=3D statedataset.iterate(niter);;


DataSet&lt;<span style=3D"background-color:#e4e4ff">State</span>&gt; statew=
ithnewlossgradient =3D data.map(difffunction).<wbr>withBroadcastSet(loop, <=
span style=3D"color:#008000;font-weight:bold">&quot;state&quot;</span>)
              .reduce(accumulate)
              .map(<span style=3D"color:#000080;font-weight:bold">new </spa=
n>NormLossGradient(datasize))
              .map(<span style=3D"color:#000080;font-weight:bold">new </spa=
n>SetLossGradient()).<wbr>withBroadcastSet(loop,<span style=3D"color:#00800=
0;font-weight:bold">&quot;state&quot;</span>)
              .map(<span style=3D"color:#000080;font-weight:bold">new </spa=
n>LBFGS());


DataSet&lt;<span style=3D"background-color:#e4e4ff">State</span>&gt; conver=
ged =3D statewithnewlossgradient.<wbr>filter(
   <span style=3D"color:#000080;font-weight:bold">new </span>FilterFunction=
&lt;<span style=3D"background-color:#e4e4ff">State</span>&gt;() {
      <span style=3D"color:#808000">@Override
</span><span style=3D"color:#808000">      </span><span style=3D"color:#000=
080;font-weight:bold">public boolean </span>filter(<span style=3D"backgroun=
d-color:#e4e4ff">State</span> value) <span style=3D"color:#000080;font-weig=
ht:bold">throws </span>Exception {
         <span style=3D"color:#000080;font-weight:bold">if</span>(value.get=
Iflag()[<span style=3D"color:#0000ff">0</span>] =3D=3D <span style=3D"color=
:#0000ff">0</span>){
            <span style=3D"color:#000080;font-weight:bold">return false</sp=
an>;
         }
         <span style=3D"color:#000080;font-weight:bold">return true</span>;
      }
   }
);

DataSet&lt;<span style=3D"background-color:#e4e4ff">State</span>&gt; finals=
tate =3D loop.closeWith(<wbr>statewithnewlossgradient,<wbr>converged);</pre=
>
    <b><br>
    </b><b>While loop:<br>
      <br>
    </b>
   =20
    <pre style=3D"background-color:#ffffff;color:#000000;font-family:&#39;C=
ourier New&#39;;font-size:9,0pt"><span style=3D"background-color:#e4e4ff"><=
span style=3D"background-color:#e4e4ff">DataSet</span>&lt;<span style=3D"co=
lor:#000080;font-weight:bold">double</span>[]&gt; data =3D...=20
State</span> state =3D <span style=3D"font-style:italic">initialState</span=
>(m, initweights,<span style=3D"color:#0000ff">0</span>,<span style=3D"colo=
r:#000080;font-weight:bold">new double</span>[initweights.<span style=3D"co=
lor:#660e7a;font-weight:bold">length</span>]);

<span style=3D"color:#000080;font-weight:bold">int </span>cnt=3D<span style=
=3D"color:#0000ff">0</span>;
<span style=3D"color:#000080;font-weight:bold">do</span>{
   LBFGS lbfgs =3D <span style=3D"color:#000080;font-weight:bold">new </spa=
n>LBFGS();
   statedataset=3Ddata.map(<wbr>difffunction).<wbr>withBroadcastSet(stateda=
taset, <span style=3D"color:#008000;font-weight:bold">&quot;state&quot;</sp=
an>)
      .reduce(accumulate)
      .map(<span style=3D"color:#000080;font-weight:bold">new </span>NormLo=
ssGradient(datasize))
      .map(<span style=3D"color:#000080;font-weight:bold">new </span>SetLos=
sGradient()).<wbr>withBroadcastSet(statedataset,<span style=3D"color:#00800=
0;font-weight:bold"><wbr>&quot;state&quot;</span>)
      .map(lbfgs);
   cnt++;
}<span style=3D"color:#000080;font-weight:bold">while </span>(cnt&lt;niter =
&amp;&amp; statedataset.collect().get(<span style=3D"color:#0000ff">0</span=
>).<wbr>getIflag()[<span style=3D"color:#0000ff">0</span>] !=3D <span style=
=3D"color:#0000ff">0</span>);</pre>
    <br>
  </div>

</blockquote></div><br></div>

--001a114968e63d8029053b874c9d--