flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kruse, Sebastian" <Sebastian.Kr...@hpi.de>
Subject RE: DeltaIterations: shrink solution set
Date Wed, 11 Feb 2015 10:12:53 GMT
That sounds promising.

Yet, I have the problem that I need the candidates in different operators. While feeding them
forward is probably easy, e.g., via broadcasts, feeding the candidates “backwards” to
the next iteration seems to be more of a problem.

As I am only building a prototype, I might do this feed-back via some global variable, but
that is very hacky. Is there some elegant way to do it? Maybe with the distributed cache?

From: ewenstephan@gmail.com [mailto:ewenstephan@gmail.com] On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:44
To: user@flink.apache.org
Subject: Re: DeltaIterations: shrink solution set

UDFs exist intentionally across iterations, it is a feature, to allow you to keep state. To
Figure out when an iteration starts and ends, you can use a RichFunctions, which get calls
to open() and close() for each iteration.

On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <Sebastian.Kruse@hpi.de<mailto:Sebastian.Kruse@hpi.de>>
wrote:
Thanks for your answers.

I am trying to build an apriori-like algorithm to find key candidates in a relational dataset.
I was considering delta iterations, because the algorithm should maintain two datasets: a
set of column combinations to be checked (as delta set) and a set of tuples which are still
relevant to the next iteration (as work set). So, the general proceeding is adapted from the
popular frequent item set algorithm.

I am now also thinking that delta iterations are not the right thing for me, also because
of other problems (only join and coGroup to be used on the solution set and “Error: Iterative
task without a single iterative input.” whose cause is not obvious to me).

@Alexander: Using an output within a bulk iteration leaves me with the following exception:
Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set
that is part of an iteration was used as a sink or action. Did you forget to close the iteration?
Do you have any experience/proposals how to incorporate your idea nevertheless?

@Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature
or is it likely to change in the future?

Cheers,
Sebastian

From: ewenstephan@gmail.com<mailto:ewenstephan@gmail.com> [mailto:ewenstephan@gmail.com<mailto:ewenstephan@gmail.com>]
On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:02
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: DeltaIterations: shrink solution set


You can also use a bulk iteration and just keep the state yourself. Since the functions love
across iterations, it is easily doable to just gather the state in a HashMap yourself. Use
map(), or mapPartition(), a manual partition() call - that should do the trick...
Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <alexander.s.alexandrov@gmail.com<mailto:alexander.s.alexandrov@gmail.com>>:
True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <vasilikikalavri@gmail.com<mailto:vasilikikalavri@gmail.com>>:

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds
to me like something you can use the workset for.

-V.
On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <alexander.s.alexandrov@gmail.com<mailto:alexander.s.alexandrov@gmail.com>>
wrote:
I am not sure whether this is supported at the moment. The only workaround I could think of
is indeed to use a boolean flag that indicates whether the element has been deleted or not.
An alternative approach is to ditch Flink's native iteration construct and write your intermediate
results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You
then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples
iterative algorithms with this property...
Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <Sebastian.Kruse@hpi.de<mailto:Sebastian.Kruse@hpi.de>>:
Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements
from the solution set and add new elements. My question is: is it possible to remove elements
from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions
that I want to verify within the next iteration. If verification fails, I would like to remove
them from the solution set, otherwise retain them.

Thanks,
Sebastian



Mime
View raw message