flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: Input from nested directory structure
Date Tue, 02 Dec 2014 16:22:08 GMT
Hi,

thanks for replying!

It would certainly be useful for my use case, but not absolutely necessary.
If you think other people might find it useful too, I can open a issue.
If not, I believe it would be nice to print a warning when a nested
directory is given as input path,
since now, the files that are in the base directory are normally processed,
but the nested ones are simply ignored.

Cheers,
V.

On 2 December 2014 at 16:52, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> Not right now. The input formats do not recursively enumerate files. In
> that, we followed the way Hadoop did it.
>
> If that is something that is interesting, it should not be too hard to add
> to the FileInputFormat an option to do a complete recursive traversal of
> the directory structure.
>
> Greetings,
> Stephan
>
>
> On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com> wrote:
>
>> Hello all,
>>
>> I want to run a Flink log processing job and my input is stored locally
>> in a nested directory structure, like the following:
>>
>> logs_dir/
>> |-----/machine1/
>> |-----------/january.log
>> |-----------/february.log
>> ...
>> |-----/machine2/
>> ...
>>
>> etc.
>>
>> When providing "logs_dir" as the argument to readTextFile(), nothing is
>> read and no an exception or error is returned.
>> Copying the nested individual files machine1/january.log,
>> machine1/february.log, ..., to the same directory works fine, but I was
>> wondering whether there is a better way to do this?
>>
>> Thank you!
>> V.
>>
>
>

Mime
View raw message