flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Roberts <arobe...@fuze.com>
Subject Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?
Date Wed, 11 Jan 2017 14:49:01 GMT

I’m trying to understand the guarantees made by Flink’s Elasticsearch sink in terms of
message delivery. according to (1), the ES sink offers at-least-once guarantees. This page
doesn’t differentiate between flink-elasticsearch and flink-elasticsearch2, so I have to
assume for the moment that they both offer that guarantee. However, a look at the code (2)
shows that the invoke() method puts the record into a buffer, and then that buffer is flushed
to elasticsearch some time later.

It’s my understanding that Flink uses checkpoint “records” flowing past the sink as
a means for forming the guarantee that all records prior to the checkpoint have been received
by the sink.  I assume that the invoke() method returning is what Flink uses to decide if
a record has passed a sink, but here invoke stashes in a buffer that doesn’t look like it
participates in checkpointing anywhere.

Does the sink provided in link-connector-elasticsearch2 guarantee at-least-once, and if it
does, how does it reconstitute the buffer (so as to not lose records that have gone through
the sink’s invoke() method, but not been transmitted to ES yet) in the case of the operator
failing when the buffer is not empty?

(1) https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html
(2) https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch2/src/main/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSink.java
View raw message