flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Re: Join with a custom predicate
Date Sun, 26 Apr 2015 21:22:22 GMT
I thought about your problem over the weekend. Unfortunately the algorithm
that you describe does not fit "regular" equi-join semantics, but I think
it could be "fitted" with a more complex dataflow.

To achieve that, I would partition the (active) domain of the two datasets
on fine-granular intervals (for the sake of the example, let's say 10.

You can prepare a "coarse-grained" join key on the inputs using a "x % 10"
(Flat)Map:

One: (0, {3,6}), (0, {5,7})
Two: (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7)

Upon that you can do a regular join on the "coarse-grained" key (in the
first component of the tuples), and follow that with a filter that
evaluates the actual "one.start <= two.number <= one.end" predicate.

Regards,
Alex


2015-04-24 20:55 GMT+02:00 Kirschnick, Johannes <
johannes.kirschnick@tu-berlin.de>:

> Hi
> I have a small problem with doing a custom join, that I would need some
> help with. Maybe I'm also approaching the problem wrong.
> So basically I have two dataset.
> The simplified example: The first one has a start and end value. The
> second dataset is just a list of ordered numbers and some value (value is
> ignored in the example)
> Example
> One = {3,6},{5,7}
> Two = 1,2,3,4,5,6,7
> What I need is a sort of custom join, that brings to the first dataset all
> elements from the second that are within the range.
> Something like .. join where one.start <= two.number <= one.end
> So {3,6} from one would only need to "see" 3,4,5
> Joining does not work out of the box here as the key is sort of "dynamic"
> depending on the value of one.
> I can just use a map for the first dataset and broadcast the second into
> the mapper which can then select the required elements - but my assumption
> is that the second dataset might actually be very large as well, but the
> qualifying join "numbers" from two will actually be small.
> Is there something I could do in this particular case?
> Thanks a lot
> Johannes
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message