flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: NOT IN with Flink
Date Thu, 11 Dec 2014 13:20:44 GMT
Hey!

Careful: The semantics in SQL of a "not-equal" join are quite different
from a NOT IN statement.

Here is how you do the equivalent of NOT IN:

If the list of elements is small and known up front, create a hash set and
give it to a filter function (closure or constructor). The filter function
can look up whether the element is contained or not.

If the elements are not known up front, use a broadcast variable that you
attach to a RichFilterFunction. In the filter function's open() method,
grab the broadcast variable and turn it into a hash set. The filter is the
same as above then.

Check out the API guides for some examples of how to use broadcast
variables.

Stephan
Am 11.12.2014 12:17 schrieb "Malte Schwarzer" <ms@mieo.de>:

> Hi,
>
> is there an easy way to a NOT IN or something like
> join().where().notEquals() on two datasets with Flink?
>
> Cheers
> Malte
>

Mime
View raw message