flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stavros Kontopoulos <st.kontopou...@gmail.com>
Subject Re: withBroadcastSet for a DataStream missing?
Date Sun, 17 Apr 2016 17:16:39 GMT
Im trying what you suggested. Is this what you are suggesting (this is just
a skeleton of logic not the actual implementation)?

    val dataStream =  ... //window based stream

    val modelStream = ...

    val connected = dataStream.connect(modelStream)

    val output = connected.map(
    (x:String) => { true},
    (y: MyModel) => {false}
  ).iterate {
    iteration =>

      val feedback = iteration.filter(!_)
      feedback.broadcast
      (feedback, iteration.filter(x => x))
  }

  output.split(
    (b: Boolean) => b match {
      case true => List("true")
      case false => List("false")
    }
  ).select("true")


I could save the model In coFlatMap but ideally i need the same model
everywhere. Broadcast does that? From the documentation i read it sends the
output to all parallel operators.
Iteration is executed anytime there is data according to the input window
stream or is it done independently so i can feed back my improved model
(like in datasets case)?
If the latter holds does that mean all partial updates from all operators
will have to be processed from each operator before the the next window
processing begins?

Thnx!


On Fri, Apr 1, 2016 at 10:51 PM, Stavros Kontopoulos <
st.kontopoulos@gmail.com> wrote:

> Ok thnx Till i will give it a shot!
>
> On Thu, Mar 31, 2016 at 11:25 AM, Till Rohrmann <trohrmann@apache.org>
> wrote:
>
>> Hi Stavros,
>>
>> you might be able to solve your problem using a CoFlatMap operation with
>> iterations. You would use one of the inputs for the iteration on which you
>> broadcast the model updates to every operator. On the other input you would
>> receive the data points which you want to cluster. As output you would emit
>> the clustered points and model updates. Here you have to use the split
>> and select function to split the output stream into model updates and
>> output elements. It’s important to broadcast the model updates, otherwise
>> not all operators have the same clustering model.
>>
>> Cheers,
>> Till
>> ​
>>
>> On Tue, Mar 29, 2016 at 7:23 PM, Stavros Kontopoulos <
>> st.kontopoulos@gmail.com> wrote:
>>
>>> H i am new here...
>>>
>>> I am trying to implement online k-means as here
>>> https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
>>> with flink.
>>> I dont see anywhere a withBroadcastSet call to save intermediate results
>>> is this currently supported?
>>>
>>> Is intermediate results state saved somewhere like in this example a
>>> viable alternative:
>>>
>>> https://github.com/StephanEwen/flink-demos/blob/master/streaming-state-machine/src/main/scala/com/dataartisans/flink/example/eventpattern/StreamingDemo.scala
>>>
>>> Thnx,
>>> Stavros
>>>
>>
>>
>

Mime
View raw message