flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bastien dine <bastien.d...@gmail.com>
Subject Re: Counting DataSet in DataFlow
Date Wed, 07 Nov 2018 15:01:32 GMT
Hi Fabian,
Thanks for the response, I am going to use the second solution !
Regards,
Bastien

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le mer. 7 nov. 2018 à 14:16, Fabian Hueske <fhueske@gmail.com> a écrit :

> Another option for certain tasks is to work with broadcast variables [1].
> The value could be use to configure two filters.
>
> DataSet<Data> input = ....
> DataSet<Long> count = input.map(-> 1L).sum()
> DataSet<OutCntZero> input.filter(if cnt == 0).withBroadcastSet("cnt",
> count).doSomething
> DataSet<OutCntNonZero> input.filter(if cnt != 0).withBroadcastSet("cnt",
> count).doSomethingElse
>
> Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/#broadcast-variables
>
>
> Am Mi., 7. Nov. 2018 um 14:10 Uhr schrieb Fabian Hueske <fhueske@gmail.com
> >:
>
>> Hi,
>>
>> Counting always requires a job to be executed.
>> Not sure if this is what you want to do, but if you want to prevent to
>> get an empty result due to an empty cross input, you can use a
>> mapPartition() with parallelism 1 to emit a special record, in case the
>> MapPartitionFunction didn't see any data.
>>
>> Best, Fabian
>>
>> Am Mi., 7. Nov. 2018 um 14:02 Uhr schrieb bastien dine <
>> bastien.dine@gmail.com>:
>>
>>> Hello,
>>>
>>> I would like to a way to count a dataset to check if it is empty or
>>> not.. But .count() throw an execution and I do not want to do separe job
>>> execution plan, as hthis will trigger multiple reading..
>>> I would like to have something like..
>>>
>>> Source -> map -> count -> if 0 -> do someting
>>>                                            if not -> do something
>>>
>>>
>>> More concrete i would like to check if one of my dataset is empty before
>>> doing a cross operation..
>>>
>>> Thanks,
>>> Bastien
>>>
>>>
>>>

Mime
View raw message