flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: DataSetUtils zipWithIndex question
Date Thu, 31 Mar 2016 11:56:27 GMT
Hi Tarandeep,

the number of elements in each partition should stay constant. In fact the
elements in each partition should not change.

Cheers,
Till

On Wed, Mar 30, 2016 at 8:14 AM, Tarandeep Singh <tarandeep@gmail.com>
wrote:

> Hi,
>
> I am looking at implementation of zipWithIndex in DataSetUtils-
>
> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java
>
> It works in two phases/steps
> 1) Count number of elements in each partition (using mapPartition)
> 2) In second mapPartition, unique ID is assigned by calculating offset
> using number of elements computed in step 1.
>
> Is there any chance the second mapPartition won't get same number of
> elements as first mapPartition (assuming data is in HDFS)?
>
> Thanks
> Tarandeep
>

Mime
View raw message