flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Parallel read text
Date Mon, 30 May 2016 13:14:00 GMT
Hi David,

I guess you can verify it by adding custom log statements into the Flink
code (therefore, you need to recompile Flink).
Maybe a debugger is also sufficient (if you are running Flink locally).
We are currently reworking the reading of static files for the streaming
environment. Maybe its interesting to check out the new implementation [1]

[1] https://github.com/apache/flink/pull/2020


On Sat, May 28, 2016 at 1:49 PM, David Olsen <davidolsen4123@gmail.com>
wrote:

> Thank you for the advice!
>
> Now I have a new question. I read the source[1] streaming env exploits
> FileSourceFunction, which inherits RichParallelSourceFunction, to create
> split input[2]. I know I can set parallelism in streaming env, but any way
> I can verify that at runtime the split files or the file is read in
> parallel?
>
> Thank you again for your help.
>
> [1].
> https://raw.githubusercontent.com/eBay/Flink/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java
>
> [2].
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
>
>
>
> On 28 May 2016 at 17:52, Chesnay Schepler <chesnay@apache.org> wrote:
>
>> ExecutionEnvironment.readTextFile will read the file in parallel.
>>
>>
>> On 28.05.2016 09:59, David Olsen wrote:
>>
>> After searching on the internet I still do not find the answer (with key
>> word like 'apache flink parallel read text') I am looking for. So asking
>> here before jumping to write code ...
>>
>> My problem is I want to a read text file or split text files (from local
>> file system). Therefore I want to parallel read those files and process
>> them accordingly.
>>
>> From what I discover so far:
>> - Use ExecutionEnvironment.readTextFile but this only serves with 1
>> thread(?) (meaning reading the file(s) from the beginning to the end)
>> - Use streaming env to addSource[1] but that seems to me I need to
>> implement my own source with RichParallelSourceFunction.
>> Is there any classes or impl that already can read text in parallel?
>> Thanks
>>
>> [1].
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Reading-separate-files-in-parallel-tasks-as-input-td1623.html
>>
>>
>>
>

Mime
View raw message