flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kien Truong <duckientru...@gmail.com>
Subject Re: Distribute crawling of a URL list using Flink
Date Mon, 14 Aug 2017 23:16:28 GMT
Hi, 

Admittedly, I have not suggested this because I thought it was not available for batch API.


Regards, 
Kien 


On Aug 15, 2017, 00:06, at 00:06, Nico Kruber <nico@data-artisans.com> wrote:
>Hi Eranga and Kien,
>Flink supports asynchronous IO since version 1.2, see [1] for details.
>
>You basically pack your URL download into the asynchronous part and
>collect 
>the resulting string for further processing in your pipeline.
>
>
>
>Nico
>
>
>[1]
>https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
>asyncio.html
>
>On Monday, 14 August 2017 17:50:47 CEST Kien Truong wrote:
>> Hi,
>> 
>> While this task is quite trivial to do with Flink Dataset API, using
>> readTextFile to read the input and
>> 
>> a flatMap function to perform the downloading, it might not be a good
>idea.
>> 
>> The download process is I/O bound, and will block the synchronous
>> flatMap function,
>> 
>> so the throughput will not be very good.
>> 
>> 
>> Until Flink supports asynchronous functions, I suggest you looks
>elsewhere.
>> 
>> An example with master-workers architecture using Akka can be found
>here
>> 
>> https://github.com/typesafehub/activator-akka-distributed-workers
>> 
>> 
>> Regards,
>> 
>> Kien
>> 
>> On 8/14/2017 10:09 AM, Eranga Heshan wrote:
>> > Hi all,
>> > 
>> > I am fairly new to Flink. I have this project where I have a list
>of
>> > URLs (In one node) which need to be crawled distributedly. Then for
>> > each URL, I need the serialized crawled result to be written to a
>> > single text file.
>> > 
>> > I want to know if there are similar projects which I can look into
>or
>> > an idea on how to implement this.
>> > 
>> > Thanks & Regards,
>> > 
>> > 
>> > 
>> > 
>> > Eranga Heshan
>> > /Undergraduate/
>> > Computer Science & Engineering
>> > University of Moratuwa
>> > Mobile: 	+94 71 138 2686 <tel:%2B94%2071%20552%202087>
>> > Email: 	eranga.h.n@gmail.com <mailto:eranga.h.n@gmail.com>
>> > <https://www.facebook.com/erangaheshan>
>> > <https://twitter.com/erangaheshan>
>> > <https://www.linkedin.com/in/erangaheshan>

Mime
View raw message