flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Some questions about Join
Date Sat, 21 Feb 2015 10:40:01 GMT
Hi,

non-equi joins are only supported by building the cross product.
This is essentially the nested-loop join strategy, that a conventional
database system would chose. However, such joins are prohibitively
expensive when applied to large data sets.
If you have one small and another large data set, you can do the join by
broadcasting the smaller side to a MapFunction (withBroadcastSet() [1])
that has the larger data set as regular input and evaluate the join
condition in the MapFunction.

The problem with the Any key-selector is, that Flink needs to know the
types when the program is optimized because it generates type specific
serializers. I think an Any type does not work as join key.

Best, Fabian


[1]
http://flink.apache.org/docs/0.8/programming_guide.html#broadcast-variables

2015-02-21 10:28 GMT+01:00 Vinh June <hoangthevinh.htv@gmail.com>:

> Hello,
>
> I have some questions concerning Join:
>
> 1. I would like to make join with different conditions, is there any way to
> create a Join with conditions different to "equalTo", for example, how
> would
> I make a join with > or >=
>
> 2. I have a DataSet[Map[String, Any]]. Is it possible to specify
> KeySelector
> using a map key? I tried to use below Scala code but it doesn't work
>
> Set1.join(Set2).where(_.get("key")).equalTo(_.get("key"))
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Some-questions-about-Join-tp780.html
> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing
> list archive at Nabble.com.
>

Mime
View raw message