flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camelia-Elena Ciolac <camelia-elena.cio...@inria.fr>
Subject Collection of files as input
Date Fri, 24 Oct 2014 10:08:24 GMT

I am working on a use case where we have a collections of files as input. 
I am using the env.createInput based on AvroInputFormat. For one input file, it is fine to
specify it in new Path(args[0]). 
But, it is possible (and if yes, how) to create a DataSet based on a collection of files directly?

I thought of a workaround of building one DataSet dsUnion to be the union result, 
and a second DataSet dsCurrent where we create an input for one file. 

read first file in dsUnion 

in a loop, repeat: 
read another file in dsCurrent 
dsUnion = dsUnion.union(dsCurrent) 
until all files in the collection are processed. 

Is there a simpler solution possible with Flink API? 

Thanks in advance! 

View raw message