flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Re: Join with a custom predicate
Date Mon, 27 Apr 2015 20:38:29 GMT
I'm curios to learn more, maybe we can discuss this over a coffee in the
next days ;)

2015-04-27 12:16 GMT+02:00 Kirschnick, Johannes <
johannes.kirschnick@tu-berlin.de>:

> Hi,
>
> thanks for dedicating some of your weekend time to this problem.
> Your solution looks quite neat, thanks ..
>
> For your information, what I'm trying to do is multiply matrix blocks
> (dataset one) with individual elements of a vector (dataset two, of row id
> and some value).
> In this case "one" needs to see all the qualifying entries from "two" to
> correctly multiply, ideally as a group for streamlined multiplication.
>
> Johannes
>
> -----Urspr√ľngliche Nachricht-----
> Von: Alexander Alexandrov [mailto:alexander.s.alexandrov@gmail.com]
> Gesendet: Sonntag, 26. April 2015 23:22
> An: dev@flink.apache.org
> Betreff: Re: Join with a custom predicate
>
> I thought about your problem over the weekend. Unfortunately the algorithm
> that you describe does not fit "regular" equi-join semantics, but I think
> it could be "fitted" with a more complex dataflow.
>
> To achieve that, I would partition the (active) domain of the two datasets
> on fine-granular intervals (for the sake of the example, let's say 10.
>
> You can prepare a "coarse-grained" join key on the inputs using a "x % 10"
> (Flat)Map:
>
> One: (0, {3,6}), (0, {5,7})
> Two: (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7)
>
> Upon that you can do a regular join on the "coarse-grained" key (in the
> first component of the tuples), and follow that with a filter that
> evaluates the actual "one.start <= two.number <= one.end" predicate.
>
> Regards,
> Alex
>
>
> 2015-04-24 20:55 GMT+02:00 Kirschnick, Johannes <
> johannes.kirschnick@tu-berlin.de>:
>
> > Hi
> > I have a small problem with doing a custom join, that I would need
> > some help with. Maybe I'm also approaching the problem wrong.
> > So basically I have two dataset.
> > The simplified example: The first one has a start and end value. The
> > second dataset is just a list of ordered numbers and some value (value
> > is ignored in the example) Example One = {3,6},{5,7} Two =
> > 1,2,3,4,5,6,7 What I need is a sort of custom join, that brings to the
> > first dataset all elements from the second that are within the range.
> > Something like .. join where one.start <= two.number <= one.end So
> > {3,6} from one would only need to "see" 3,4,5 Joining does not work
> > out of the box here as the key is sort of "dynamic"
> > depending on the value of one.
> > I can just use a map for the first dataset and broadcast the second
> > into the mapper which can then select the required elements - but my
> > assumption is that the second dataset might actually be very large as
> > well, but the qualifying join "numbers" from two will actually be small.
> > Is there something I could do in this particular case?
> > Thanks a lot
> > Johannes
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message