Hi Andre,
On 4 November 2015 at 16:04, André Petermann <
petermann@informatik.unileipzig.de> wrote:
> Hi Fabian,
>
> thanks for your fast reply!
>
> I created a gist to explain the whilenotempty loop in more detail:
> https://gist.github.com/p3et/9f6e56cf0b68213e3e2b
>
> It is an approach to create a minimal example of the kind of algorithm
> corresponding to page 1 of the PDF, in particular frequent substrings.
> Please don't care about the algorithm itself, the actual algorithm is much
> more complex. However, even this simple case runs only for 7 iterations on
> my machine before "too few memory segments" is raising.
>
> If I understood delta iteration right, it's, necessary to provide a 1:1
> mapping between working and solution set items. The emptycase you referred
> to is the nomoreupdates case, but not nomoreitems, right?
>
I think it's possible to do what you want with a delta iteration.
The solution set and workset *don't* need to be of the same type. When you
define the delta iteration,
e.g. DeltaIteration<ST, WT> iteration = ...,
ST is the type of the solution set and WT is the type of the workset.
>
> In my example, the working set is completely replaced for each iteration
> (line 52), with only a parentchild mapping. The solution set is initially
> empty (line 33) and stores results of all iterations (line 48). I hope this
> shows the difference to the delta iteration and whilenotempty. Further
> on, you see the different data types of working and solution sets.
> I will provide a further Gist for the "second choice" solution using delta
> iteration. However, what we actually would prefer is to replace the
> forloop by something like while(embeddings.isNotEmpty()) including a truly
> iterative execution.
>
The default convergence criterion for a delta iteration is an empty
workset. Thus, if you set embeddings as the workset, you have your
"while(embeddings.isNotEmpty())" logic.
Also, as far as I know, there is no problem with appending new elements
to the solution set. So, using allFrequentPatterns as the solution set
should be fine.
>
> But please let me know if I missed some Flink features already enabling
> such loops.
>
> Thanks,
> Andre
Does this clear things out a bit? Let me know if I misunderstood what you
want to do.
Cheers,
Vasia.
>
>
> On 03.11.2015 15:47, Fabian Hueske wrote:
>
>> Hi Andre,
>>
>> Thanks for reaching out to the Flink community!
>>
>> I am not sure your analysis is based on correct assumptions about Flink's
>> delta iterations.
>> Flink's delta iterations do support
>>  working and solution sets of different types
>>  worksets that grow and shrink or are completely replaced in each
>> iteration
>>  solution sets that grow (up to the point of the too few memory segments
>> error)
>>  termination via a custom aggregator. So it is max number of iterations,
>> empty workset, and a custom termination criterion.
>>
>> The problem of the "too few memory segments" occurs because the solution
>> set is held in an inmemory hash table.
>> Spilling that table would result in a significant iteration slowdown. One
>> solution is to throw more resources at the problem (more nodes, more RAM).
>> Another workaround is to reduce the amount of managed memory and switch
>> the solution set hash table to "unmanaged". This will keep the result set
>> in a regular Java HashMap but this might cause a OOMError and kill the JVM
>> if it grows too large.
>> AFAIK, there is no way to shrink the solution set at the moment. It might
>> be worth to investigate into that direction.
>>
>> Regarding your questions about the whilenotempty loop. What exactly is
>> the difference to the default delta iteration. It does stop when the
>> workset is empty (in addition to the custom termination criterion).
>> I am not sure if I got all your points. Please let me know, if I got
>> something wrong.
>>
>> Thanks,
>> Fabian
>>
>> 20151103 13:50 GMT+01:00 André Petermann <
>> petermann@informatik.unileipzig.de>:
>>
>> Sorry for redundant post ...
>>>
>>>
>>> Hi all,
>>>
>>> We are working on an implementation of Frequent Subgraph Mining using
>>> Flink. At that, the "too few memory segments error" prevents the most
>>> promising solution. The problem is not specific for graphs, but all
>>> iterative problems where
>>>
>>>  working and solution sets contain data of different types
>>>  the working set may grow, shrink or is replaced for each iteration
>>>  the solution set grows for each iteration
>>>  the termination criterion is based on data set metrics,
>>> e.g. while working set not empty
>>>
>>> An illustration of our problem workflow, generalized to graph unspecific
>>> frequent pattern mining, can be found here:
>>>
>>>
>>> https://github.com/dbsleipzig/gradoop/blob/master/devsupport/loopWithIntermediateResultMemory.pdf
>>>
>>> Page 1 shows the most promising solution. We started implementing it
>>> using a for loop. However, the "too few memory segments error" makes it
>>> untestable. As the iteration body itself is a complex workflow and the
>>> number of iterations is arbitrary, unrolling it while reserving operator
>>> memory will be a permanent limitation. Increasing limits or physical
>>> memory would only delay the problem. The resulting question is:
>>>
>>> Would it be possible to implement a whilenotempty or at least a for
>>> loop, that isn't unrolled and can be executed more memory efficient?
>>>
>>> Page 2 shows an alternative solution to our problem using the concept of
>>> delta iteration. However, Flink delta iteration does neither support
>>> broadcasting nor workingset independent intermediate results. Page 3
>>> shows our working solution using two workarounds for those restrictions.
>>> However, these workarounds lead to unnecessary memory consumption and
>>> redundant expensive computations. So, in the case the answer to the
>>> first question is no, a second question:
>>>
>>> Would it be possible to extend the delta iteration by the support of
>>> rich map functions with broadcast sets and the memory of intermediate
>>> results?
>>>
>>> We think, that a whilenotempty loop might be useful for other
>>> algorithms too, e.g. variable length path search in graphs. Did we miss
>>> Flink features meeting our requirements? Do you think it's worth to
>>> create an improvement issue? At that, we'd of course be willing to
>>> contribute in the form of development.
>>>
>>> Best Regards
>>> Andre
>>>
>>> 
>>> 
>>> PhD Student
>>> University of Leipzig
>>> Department of Computer Science
>>> Database Research Group
>>>
>>> email: petermann@informatik.unileipzig.de
>>> web: dbs.unileipzig.de
>>> 
>>>
>>>
>>
