flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dániel Bali <balijanosdan...@gmail.com>
Subject Reading separate files in parallel tasks as input
Date Sun, 14 Jun 2015 14:34:38 GMT
Hello!

We are running an experiment on a cluster and we have a large input split
into multiple files. We'd like to run a Flink job that reads the local file
on each instance and processes that. Is there a way to do this in the batch
environment? `readTextFile` wants to read the file on the JobManager and
split that right there, which is not what we want.

We solved it in the streaming environment by using `addSource`, but there
is no similar function in the batch version. Does anybody know how this
could be done?

Thanks!
Daniel

Mime
View raw message