flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Zagrebin <and...@data-artisans.com>
Subject Re: hadoopInputFormat and elasticsearch
Date Thu, 04 Oct 2018 12:20:21 GMT
Hi,

At the moment if the processing of any data input split fails,
Flink will restart the batch job completely from scratch.

There is an ongoing effort to improve fine-grained recovery in FLINK-4256.

Best,
Andrey

> On 2 Oct 2018, at 13:52, aviad <rotem.aviad@gmail.com> wrote:
> 
> Hi,
> 
> I want to write batch job which reads data from *elasticsearch* using
> *elasticsearch-hadoop* (https://github.com/elastic/elasticsearch-hadoop/)
> and *hadoopInputFormat*
> 
> example code (from
> https://github.com/genged/flink-playground/blob/master/src/main/java/com/mic/flink/FlinkMain.java):
> 
> 
> 
> elasticsearch-hadoop creates one Hadoop InputSplit (tasks) per Elasticsearch
> shard.
> so if my index have 20 shards, it will be split to 20 InputSplit
> 
> 
> /My question is:/
> What will happen if my job restart (failover) after finishing half of the
> InputSplit's ?
> Does hadoopInputFormat remember which InputSplit are finished and knows how
> to continue from where it stopped? (maybe read from beginning of unfinished
> InputSplit? ) or it starts from the beginning?
> 
> thanks
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Mime
View raw message