flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: DataSet in Streaming application under Flink
Date Tue, 19 Jan 2016 17:41:15 GMT
Hi Sylvain,

what you could do for example is to load a static data set, e.g. from HDFS,
in the open method of your comparator and cache it there. The open method
is called for each task once when it is created. The comparator could then
be a RichMapFunction implementation. By making the field storing the small
data set static, you can even share the data among all tasks which run on
the same TaskManager.


On Tue, Jan 19, 2016 at 5:53 PM, Sylvain Hotte <sylvain@sylvainhotte.ca>

> Hi,
> I want to know if it is possible to load a small dataset in a stream
> application under flink.
> Here's an example:
> I have a data stream A and a Data Set B
> I need to compare all A tuple to tuple of B.
> Since B is small, it would be loaded on all node and be persistent (not
> reloaded at every computation)
> I am doing a Master on realtime geospatial  operator in Big Data and I
> looking at different strategy to spatially distribute the stream base on
> application and operation characteristic.
> One of them involve comparing dataset & datastream.
> Regards,
> Sylvain Hotte

View raw message