flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Petermann <peterm...@informatik.uni-leipzig.de>
Subject Question about limitations of iterative algorithms
Date Tue, 03 Nov 2015 12:47:05 GMT
Hi all,

We are working on an implementation of Frequent Subgraph Mining using
Flink. At that, the "too few memory segments error" prevents the most
promising solution. The problem is not specific for graphs, but all
iterative problems where

- working and solution sets contain data of different types
- the working set may grow, shrink or is replaced for each iteration
- the solution set grows for each iteration
- the termination criterion is based on data set metrics,
   e.g. while working set not empty

An illustration of our problem workflow, generalized to graph unspecific
frequent pattern mining, can be found here:
https://github.com/dbs-leipzig/gradoop/blob/master/dev-support/loopWithIntermediateResultMemory.pdf

Page 1 shows the most promising solution. We started implementing it
using a for loop. However, the "too few memory segments error" makes it
untestable. As the iteration body itself is a complex workflow and the
number of iterations is arbitrary, unrolling it while reserving operator
memory will be a permanent limitation. Increasing limits or physical
memory would only delay the problem. The resulting question is:

Would it be possible to implement a while-not-empty or at least a for
loop, that isn't unrolled and can be executed more memory efficient?

Page 2 shows an alternative solution to our problem using the concept of
delta iteration. However, Flink delta iteration does neither support
broadcasting nor working-set independent intermediate results. Page 3
shows our working solution using two workarounds for those restrictions.
However, these workarounds lead to unnecessary memory consumption and
redundant expensive computations. So, in the case the answer to the
first question is no, a second question:

Would it be possible to extend the delta iteration by the support of
rich map functions with broadcast sets and the memory of intermediate
results?

We think, that a while-not-empty loop might be useful for other
algorithms too, e.g. variable length path search in graphs. Did we miss
Flink features meeting our requirements? Do you think it's worth to
create an improvement issue? At that, we'd of course be willing to
contribute in the form of development.

Best Regards
Andre

Mime
View raw message