flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: Intentional delta-iteration constraint or bug?
Date Thu, 10 Jul 2014 13:25:53 GMT
Hi Jack,

regarding the "solution set delta does not depend on the workset" issue, in
Delta iterations the state is maintained in the solution set and the
workset serves as the input to the next iteration. The solution set delta
represents the changes that need to be merged to the state (solution set)
at the end of each superstep. Therefore, the solution set delta needs to
depend on the input (the workset). However, as Fabian already said, this is
not detected if you're using a broadcast set. Why don't you provide it as a
regular input to your operator?

Regarding the notion of keys, the Delta iteration assumes that each element
in the solution set is uniquely identified by a key and this is how the
merge with the solution set delta happens, after the end of a superstep.
One solution to your problem might be to create a new random key for each
element that you want to add to the solution set. Would that be possible?

Cheers,
V.





On 7 July 2014 10:53, Fabian Hueske <fhueske@apache.org> wrote:

> Hi Jake,
>
> the problem in the other thread was that, for testing purposes, a
> collection data source was used as delta set. So the working set and the
> delta set were not connected at all.
>
> In your program this is the case, though the connection is through a
> broadcast set which is not detected by the compiler.
>
> I am not very familiar with the iterations code, so I leave the tricky
> questions for somebody who knows the details.
>
> Best, Fabian
>
>
>
>
> 2014-07-06 10:38 GMT+02:00 Jack David Galilee <jgal2833@uni.sydney.edu.au
> >:
>
> > Hi,
> >
> >
> > I know a similar problem to this was raised earlier last month from the
> > archive (
> > http://mail-archives.apache.org/mod_mbox/flink-dev/201406.mbox/browser).
> > However, I am unable to see if this was ever solved.
> >
> >
> > I am encountering the same problem "In the given plan, the solution set
> > delta does not depend on the workset.", but what I can't ascertain
> (having
> > examined the PACT compiler (0.5.1)) is whether this is a bug or an
> > intentional design constraint placed on the delta iteration operator.
> >
> >
> > My algorithm sits between the Delta and Bulk iterative models as an
> > Incremental iterative algorithm. The solution set is the union of all
> > working sets up until the current working set is empty.
> >
> >
> > The working set is broadcast to a single operator in the data-flow. This
> > appears to be the problem, the compiler is unable to determine the
> > dependency via this broadcast.
> >
> >
> > To make things more complex my data does not suit the pseudo-relational
> > model Flink is designed around. I am dealing with variable length sets /
> > arrays so I can't join against the solution set, or working set between
> > iterations because the data has no notion of keys.
> >
> >
> > I can make it 'run' as a BulkIteration, but the result is the final state
> > (the empty working set) as at least the 0.5.1 API doesn't allow all
> > previous steps to be captured in a union - I essentially lose the answer
> > once the algorithm converges.
> >
> >
> > Your opinion as to whether this is actually a bug, or if I am doing it
> all
> > completely wrong would be most appreciated.
> >
> >
> >
> > Cheers,
> >
> > Jack Galilee
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message