flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bortoli <s.bort...@gmail.com>
Subject Re: Performance and accuracy of Flink iterations
Date Wed, 18 May 2016 15:33:04 GMT
Thanks Vasia for the very clear explanation.

saluti,
Stefano

2016-05-17 14:53 GMT+02:00 Vasiliki Kalavri <vasilikikalavri@gmail.com>:

> Hi Greg,
>
> I think there is confusion between what delta means in the "delta iteration
> operator" of Flink and the "delta approximate implementation" of an
> algorithm, such as in PageRank.
>
> Assuming that we have a graph with a set of vertices and an iterative
> fixpoint algorithm that updates the vertex values in each iteration, the
> bulk iteration operator will generate a new set of vertex values at
> iteration k, which will be the input of iteration k+1, etc..
>
> Using the delta iteration operator, we can keep state inside the iteration
> and thus, exploit the asymmetrical convergence of some algorithms, i.e.
> only compute on the "active" vertices. An active vertex is defined as one
> that has changed its value since the last iteration.
>
> If the state update function is idempotent and weakly monotonic, e.g. min
> (like in connected components, sssp), then the delta iteration result is
> equivalent to the bulk iteration.
> If the state update function is linear and decomposable, e.g. sum (like in
> PageRank), then you can re-write the update function and propagate only
> _deltas_, instead of value updates. This way the result will again be
> equivalent to the bulk iteration result.
>
> Now, if your application can tolerate some level of inaccuracy, then you
> can define a threshold on what it means for a vertex value to "have changed
> its value". The higher the threshold, the less active vertices you'll have
> and probably faster convergence, but higher error as well. Thus, accuracy
> is not a property of the bulk or the delta iteration themselves, but rather
> depends on the algorithm and on the vertex activation criterion.
>
> I hope this clears things up!
> -Vasia.
>
> On 17 May 2016 at 14:39, Till Rohrmann <trohrmann@apache.org> wrote:
>
> > Hi Greg,
> >
> > as far as I know there has not been an exhaustive comparison to what
> extent
> > the delta iterations can achieve the same accuracy as bulk iterations or
> > how much accuracy you'll lose. I think it strongly depends on the
> problem.
> > For example, graph algorithms such as connected components shouldn't
> suffer
> > from it. In contrast, the PageRank implementation with the THRESHOLD
> value
> > should not produce the (most) accurate result. Of course this depends on
> > the threshold value. Do you want to make such a comparison?
> >
> > Cheers,
> > Till
> >
> > On Mon, May 16, 2016 at 3:10 PM, Greg Hogan <code@greghogan.com> wrote:
> >
> > > Hi,
> > >
> > > This question has arisen with the HITS algorithm (Hubs and Authorities)
> > but
> > > the question is the same as with PageRank, for which Stephan published
> an
> > > excellent discussion and comparison of bulk and delta iterations [0].
> > >
> > > Delta iterations are clearly faster. Has there been a comparison as to
> > > whether, when, or how delta iterations are more accurate?
> > >
> > > Greg
> > >
> > > [0]
> > >
> > >
> >
> http://data-artisans.com/data-analysis-with-flink-a-case-study-and-tutorial/
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message