flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jark Wu" <wuchong...@alibaba-inc.com>
Subject Re: [QUESTION] the differences between DataStream.join() and DataStream.coGroup()
Date Thu, 19 May 2016 09:32:39 GMT
Thanks for your explain!  I get it. ------------------------------------------------------------------From:Fabian
Hueske <fhueske@gmail.com>Send Time:2016年5月19日(星期四) 15:21To:dev@flink.apache.org
<dev@flink.apache.org>; 伍翀(云邪) <wuchong.wc@alibaba-inc.com>Subject:Re:
[QUESTION] the differences between DataStream.join() and DataStream.coGroup()

you are right, at them moment join() looks like syntactic sugar around coGroup(). Internally,
it calls wraps a FlatJoinFunction in a CoGroupFunction and calls DataStream.coGroup().
This can be done because CoGroup is more generic and can be used to execute a Join. However,
there can be also more efficient strategies to execute a join because join is more specialized.

Providing an API for join has several benefits:
- the implementation can be improved without affecting the user
- The DataStream API is more similar to the DataSet API which might help users that touch
both APIs.
- Join anc CoGroup are similar, but also different operations. CoGroup looks at full group
of elements with the same key. Join only at pairs of elements with identical keys. Due to
SQL, the concept of a join is probably better known than coGroup.

Best, Fabian

2016-05-19 9:05 GMT+02:00 Jark Wu <wuchong.wc@alibaba-inc.com>:
I have read the source code , and found that the JoinedStreams' implementation code is almost
the same with CoGroupedStreams' (internally JoinedStreams' implementation is based on CoGroupedStreams).
So why we provide two different interface `DataStream.join()` and `DataStream.coGroup()` which
are exactly the same ?  And the document[1] has not indicated they are doing the same thing.
Or is there any differences between `DataStream.join()` and `DataStream.coGroup()` which
I missed ? 

-- Jark Wu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message