flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Bode <maximilian.b...@tngtech.com>
Subject Checkpoints in batch processing & JDBC Output Format
Date Mon, 09 Nov 2015 16:25:24 GMT
Hi everyone,

I am considering using Flink in a project. The setting would be a YARN cluster where data
is first read in from HDFS, then processed and finally written into an Oracle database using
an upsert command. If I understand the documentation correctly, the DataSet API would be the
natural candidate for this problem.

My first question is about the checkpointing system. Apparently (e.g. [1] and [2]) it does
not apply to batch processing. So how does Flink handle failures during batch processing?
For the use case described above, 'at least once' semantics would suffice – still, are 'exactly
once' guarantees possible?
For example, how does Flink handle a failure of one taskmanager during a batch process? What
happens in this case, if the data has already partly been written to the database?

Secondly, the most obvious, straight-forward approach of connecting to the Oracle DB would
be the JDBC Output Format. In [3], it was mentioned that it does not have many users and might
not be trusted. What is the status on this?

Best regards,
Max

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-Spark-tp583p587.html
[2] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Batch-Processing-as-Streaming-td1909.html
[3] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/PotsgreSQL-JDBC-Sink-quot-writeRecord-failed-quot-and-quot-Batch-element-cancelled-quot-on-upsert-td623.html

Mime
View raw message