Hi Greg,
I think there is confusion between what delta means in the "delta iteration
operator" of Flink and the "delta approximate implementation" of an
algorithm, such as in PageRank.
Assuming that we have a graph with a set of vertices and an iterative
fixpoint algorithm that updates the vertex values in each iteration, the
bulk iteration operator will generate a new set of vertex values at
iteration k, which will be the input of iteration k+1, etc..
Using the delta iteration operator, we can keep state inside the iteration
and thus, exploit the asymmetrical convergence of some algorithms, i.e.
only compute on the "active" vertices. An active vertex is defined as one
that has changed its value since the last iteration.
If the state update function is idempotent and weakly monotonic, e.g. min
(like in connected components, sssp), then the delta iteration result is
equivalent to the bulk iteration.
If the state update function is linear and decomposable, e.g. sum (like in
PageRank), then you can rewrite the update function and propagate only
_deltas_, instead of value updates. This way the result will again be
equivalent to the bulk iteration result.
Now, if your application can tolerate some level of inaccuracy, then you
can define a threshold on what it means for a vertex value to "have changed
its value". The higher the threshold, the less active vertices you'll have
and probably faster convergence, but higher error as well. Thus, accuracy
is not a property of the bulk or the delta iteration themselves, but rather
depends on the algorithm and on the vertex activation criterion.
I hope this clears things up!
Vasia.
On 17 May 2016 at 14:39, Till Rohrmann <trohrmann@apache.org> wrote:
> Hi Greg,
>
> as far as I know there has not been an exhaustive comparison to what extent
> the delta iterations can achieve the same accuracy as bulk iterations or
> how much accuracy you'll lose. I think it strongly depends on the problem.
> For example, graph algorithms such as connected components shouldn't suffer
> from it. In contrast, the PageRank implementation with the THRESHOLD value
> should not produce the (most) accurate result. Of course this depends on
> the threshold value. Do you want to make such a comparison?
>
> Cheers,
> Till
>
> On Mon, May 16, 2016 at 3:10 PM, Greg Hogan <code@greghogan.com> wrote:
>
> > Hi,
> >
> > This question has arisen with the HITS algorithm (Hubs and Authorities)
> but
> > the question is the same as with PageRank, for which Stephan published an
> > excellent discussion and comparison of bulk and delta iterations [0].
> >
> > Delta iterations are clearly faster. Has there been a comparison as to
> > whether, when, or how delta iterations are more accurate?
> >
> > Greg
> >
> > [0]
> >
> >
> http://dataartisans.com/dataanalysiswithflinkacasestudyandtutorial/
> >
>
