flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Parallel file read in LocalEnvironment
Date Wed, 07 Oct 2015 13:24:43 GMT
I'm sorry there is no such documentation.
You need to look at the code :-(

2015-10-07 15:19 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:

> And what is the split policy for the FileInputFormat?it depends on the fs
> block size?
> Is there a pointer to the several flink input formats and a description of
> their internals?
>
> On Wed, Oct 7, 2015 at 3:09 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi Flavio,
>>
>> it is not possible to split by line count because that would mean to read
>> and parse the file just for splitting.
>>
>> Parallel processing of data sources depends on the input splits created
>> by the InputFormat. Local files can be split just like files in HDFS.
>> Usually, each file corresponds to at least one split but multiple files
>> could also be put into a single split if necessary.The logic for that would
>> go into to the InputFormat.createInputSplits() method.
>>
>> Cheers, Fabian
>>
>> 2015-10-07 14:47 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Hi to all,
>>>
>>> is there a way to split a single local file by line count (e.g. a split
>>> every 100 lines) in a LocalEnvironment to speed up a simple map function?
>>> For me it is not very clear how the local files (files into directory if
>>> recursive=true) are managed by Flink..is there any ref to this internals?
>>>
>>> Best,
>>> Flavio
>>>
>>
>>
>
>

Mime
View raw message