flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From defstat <defs...@gmail.com>
Subject Statefull computation
Date Sun, 23 Aug 2015 19:36:02 GMT
Hi. I am struggling the past few days to find a solution on the following
problem, using Apache Flink: 

I have a stream of vectors, represented by files in a local folder. After a
new text file is located using DataStream<String> text =
env.readFileStream(...), I transform (flatMap), the Input into a
SingleOutputStreamOperator<Tuple2&lt;String, Integer>, ?>, with the Integer
being the score coming from a scoring function. 

I want to persist a global HashMap containing the top-k vectors, using their
scores. I approached the problem using a statefull transformation. 
1. The first problem I have is that the HashMap retains per-sink data (so
for each thread of workers, one HashMap of data). How can I make that a
Global collection 

2. Using Apache Spark, I made that possible by 
JavaPairDStream<String, Integer> stateDstream =
tuples.updateStateByKey(updateFunction); 

and then making transformations on the stateDstream. Is there a way I can
get the same functionality using FLink? 

Thanks in advance! 



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefull-computation-tp2494.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message