flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From françois lacombe <francois.laco...@dcbrain.com>
Subject Re: How to load multiple same-format files with single batch job?
Date Tue, 05 Feb 2019 11:16:33 GMT
Thank you Fabian,

That's good, I'll go for a custom File input stream.

All the best

François

Le lun. 4 févr. 2019 à 12:10, Fabian Hueske <fhueske@gmail.com> a écrit :

> Hi,
>
> The files will be read in a streaming fashion.
> Typically files are broken down into processing splits that are
> distributed to tasks for reading.
> How a task reads a file split depends on the implementation, but usually
> the format reads the split as a stream and does not read the split as a
> whole before emitting records.
>
> Best,
> Fabian
>
> Am Mo., 4. Feb. 2019 um 12:06 Uhr schrieb françois lacombe <
> francois.lacombe@dcbrain.com>:
>
>> Hi Fabian,
>>
>> Thank you for this input.
>> This is interesting.
>>
>> With such an input format, will all the file will be loaded in memory
>> before to be processed or will all be streamed?
>>
>> All the best
>> François
>>
>> Le mar. 29 janv. 2019 à 22:20, Fabian Hueske <fhueske@gmail.com> a
>> écrit :
>>
>>> Hi,
>>>
>>> You can point a file-based input format to a directory and the input
>>> format should read all files in that directory.
>>> That works as well for TableSources that are internally use file-based
>>> input formats.
>>> Is that what you are looking for?
>>>
>>> Best, Fabian
>>>
>>> Am Mo., 28. Jan. 2019 um 17:22 Uhr schrieb françois lacombe <
>>> francois.lacombe@dcbrain.com>:
>>>
>>>> Hi all,
>>>>
>>>> I'm wondering if it's possible and what's the best way to achieve the
>>>> loading of multiple files with a Json source to a JDBC sink ?
>>>> I'm running Flink 1.7.0
>>>>
>>>> Let's say I have about 1500 files with the same structure (same format,
>>>> schema, everything) and I want to load them with a *batch* job
>>>> Can Flink handle the loading of one and each file in a single source
>>>> and send data to my JDBC sink?
>>>> I wish I can provide the URL of the directory containing my thousand
>>>> files to the batch source to make it load all of them sequentially.
>>>> My sources and sinks are currently available for BatchTableSource, I
>>>> guess the cost to make them available for streaming would be quite
>>>> expensive for me for the moment.
>>>>
>>>> Have someone ever done this?
>>>> Am I wrong to expect doing so with a batch job?
>>>>
>>>> All the best
>>>>
>>>> François Lacombe
>>>>
>>>>
>>>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>>>>    <https://www.linkedin.com/company/dcbrain>
>>>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>>>
>>>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>>>> nécessaire
>>>>
>>>
>>
>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>> <https://www.linkedin.com/company/dcbrain>
>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>
>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>> nécessaire
>>
>

-- 

 <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>  

<https://www.linkedin.com/company/dcbrain>   
<https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>


 Pensez à la 
planète, imprimer ce papier que si nécessaire 

Mime
View raw message