flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gábor Gévay <gga...@gmail.com>
Subject Re: Looping over a DataSet and accesing another DataSet
Date Sun, 30 Oct 2016 10:29:49 GMT

In Flink, one often used way to access data from multiple DataSets at
the same time is to perform a join (Flink actually calls equi-joins
[1] just "join"), just as in the database world.

For example, in the algorithm that you linked, you access A[u] for
every edge (u,v). I assume that you have stored A in a DataSet of
(index, value) pairs. You can achieve this access pattern by
performing a join, and in the join condition you specify that the
first endpoint of the edge should be equal to the index of A. This
way, you get a DataSet where every record contains an edge (u,v) and
also A[u], so you can do a map on this where the UDF of your map will
get (u,v) and A[u].

Your algorithm also accesses A[v], which can be achieved by performing
a second join that is similar to the first (using the result of the

However, the updating of P will be more tricky to translate to Flink.
I'm not sure I undersand the linked algorithm correctly: does every
element of P contain a list, and the + means appending an element to a
list? (in the line P[v] = P[u] + v)


[1] https://en.wikipedia.org/wiki/Join_(SQL)#Equi-join

2016-10-30 8:25 GMT+01:00 otherwise777 <wouter@onzichtbaar.net>:
> Currently i'm trying to implement this algorithm [1] which requires me to
> loop over one DataSet (the edges) and access another DataSet (the vertices),
> for this loop i use a Mapping (i'm not sure if this is the correct way of
> looping over a DataSet) but i don't know how to access the elements of
> another DataSet while i'm looping over one.
> I know Gelly also has iterative support for these kind of things, but they
> loop over the Vertices and not the Edges
> [1] http://prntscr.com/d0qeyd
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

View raw message