flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Safaric <dominiksafa...@gmail.com>
Subject Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced
Date Mon, 12 Dec 2016 18:54:38 GMT
Hi everyone,

As I’ve implemented a RollingSink writing messages consumed from a Kafka log, I’ve observed
that there is a significant mismatch in the number of messages consumed and written to file
system.

Namely, the consumed Kafka topic contains in total 1.000.000 messages. The topology does not
perform any data transformation whatsoever, but instead of, data from the source is pushed
straight to the RollingSink. 

After I’ve checksummed the output files, I’ve observed that the total number of messages
written to the output files is greater then 7.000.000 - a different of 6.000.000 records more
then consumed/available.

What is the cause of this behaviour? 

Regards,
Dominik   
Mime
View raw message