flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Data loss in Flink Kafka Pipeline
Date Thu, 07 Dec 2017 22:47:47 GMT
Hi Nishu,

the data loss might be caused by the fact that processing time triggers do
not fire when the program terminates.
So, if your program has records stored in a window and program terminates
because the input was fully consumed, the window operator won't process the
remaining windows but simply be canceled.

Best, Fabian

2017-12-07 23:13 GMT+01:00 Nishu <nishutayal@gmail.com>:

> Hi,
>
> Thanks for your inputs.
> I am reading Kafka topics in Global windows and have defined some
> ProcessingTime triggers. Hence there is no late records.
>
> Program is performing join between multiple kafka topics. It consists
> following types of Transformation sequence is something like :
> 1. Read Kafka topic
> 2. Apply Window and trigger on kafka topic
> 3. Parse the data into POJO objects
> 4. Group the POJO objects by their keys
> 5. Read other topics and perform same steps
> 6. Join the Grouped Output with other topic Grouped records.
>
> I get all records until 3rd point as expected. But in point 4, few keys
> are dropped with inconsistent behavior in each run.
> I have tried the pipeline with different-2 setup i.e 1 task slot, 1
> parallel thread,  or multiple task slot n multiple thread.
>
> It looks like BeamFlink runner has some bug in the pipeline translation in
> streaming pipeline scenario.
>
> Thanks,
> Nishu
>
>
> On Thu, Dec 7, 2017 at 7:13 PM, Chen Qin <qinnchen@gmail.com> wrote:
>
>> Nishu
>>
>> You might consider sideouput with metrics at least after window. I would
>> suggest having that to catch data screw or partition screw in all flink
>> jobs  and amend if needed.
>>
>> Chen
>>
>> On Thu, Dec 7, 2017 at 9:48 AM Fabian Hueske <fhueske@gmail.com> wrote:
>>
>>> Is it possible that the data is dropped due to being late, i.e., records
>>> with timestamps behind the current watemark?
>>> What kind of operations does your program consist of?
>>>
>>> Best, Fabian
>>>
>>> 2017-12-07 10:20 GMT+01:00 Sendoh <unicorn.banachi@gmail.com>:
>>>
>>>> I would recommend to also print the count of input and output of each
>>>> operator by using Accumulator.
>>>>
>>>> Cheers,
>>>>
>>>> Sendoh
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>>>> nabble.com/
>>>>
>>>
>>> --
>> Chen
>> Software Eng, Facebook
>>
>
>
>
> --
> Thanks & Regards,
> Nishu Tayal
>

Mime
View raw message