flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Brunmayr <...@kpibench.com>
Subject Re: Difference between partition and groupBy
Date Fri, 24 Feb 2017 14:08:26 GMT
Thank you for that answer. Helped me a lot

2017-02-23 22:10 GMT+01:00 Fabian Hueske <fhueske@gmail.com>:

> Hi Patrick,
> as Robert said, partitionBy() shuffles the data such that all records with
> the same key end up in the same partition. That's all it does.
> groupBy() also prepares the data in each partition to be processed per
> key. For example, if you run a groupReduce after a groupBy(), the data is
> first shuffled (just like partitionBy()) and then in each partition sorted
> to organize it by key. So groupBy() does more than partitionBy() because it
> organizes the data in each partition to be processed by key.
> Moreover, groupBy() alone is not a complete operation but just "prepares"
> a following operation. It must be called with a reduce or combine operator.
> In contrast partitionBy() is by itself complete.
> So the difference between partitionBy() and groupBy() is more than just an
> API thing.
> Hope that helps,
> Fabian
> 2017-02-23 21:51 GMT+01:00 Robert Metzger <rmetzger@apache.org>:
>> Hi Patrick,
>> I think (but I'm not 100% sure) its not a difference in what the engine
>> does in the end, its more of an API thing. When you are grouping, you can
>> perform operations such as reducing afterwards.
>> On a partitioned dataset, you can do stuff like processing each partition
>> in parallel, or sort them.
>> The parallelism is independent of the partitioning or grouping. Usually
>> there are more partitions than parallel instances, so each instance will
>> take care of multiple partitions.
>> On Thu, Feb 23, 2017 at 6:16 PM, Patrick Brunmayr <jay@kpibench.com>
>> wrote:
>>> What is the basic difference between partitioning datasets by key or
>>> grouping them by key ?
>>> Does it make a difference in terms of paralellism ?
>>> Thx

View raw message