flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Read a given list of HDFS folder
Date Mon, 21 Mar 2016 12:38:47 GMT
Hey Gwenhaël,

see here for recursive traversal of input paths:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-the-input-path-directory

Regarding the phases: the best way to exchange data between batch jobs
is via files. You can then execute two programs one after the other,
the first one produces the files, which the second jobs uses as input.

– Ufuk



On Mon, Mar 21, 2016 at 12:14 PM, Gwenhael Pasquiers
<gwenhael.pasquiers@ericsson.com> wrote:
> Hello,
>
> Sorry if this has been already asked or is already in the docs, I did not find the answer
:
>
> Is there a way to read a given set of folders in Flink batch ? Let's say we have one
folder per hour of data, written by flume, and we'd like to read only the N last hours (or
any other pattern or arbitrary list of folders).
>
> And while I'm at it I have another question :
>
> Let's say that in my batch task I need to sequence two "phases" and that the second phase
needs the final result from the first one.
>  - Do I have to create, in the TaskManager, one Execution environment per task and execute
them one after the other ?
>  - Can my TaskManagers send back some data (other than counters) to the JobManager or
do I have to use a file to store the result from phase one and use it in phase Two ?
>
> Thanks in advance for your answers,
>
> Gwenhaël

Mime
View raw message