flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carst Tankink <ctank...@bol.com>
Subject Re: How to divide streams on key basis and deliver them
Date Thu, 15 Jun 2017 06:19:10 GMT
Hi,

Let me try to explain this from another user’s perspective ☺

When you run your application, Flink will map your logical/application topology onto a number
of task slots (documented in more detail here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html).

Basically, if it is possible/unless told otherwise, Flink will create a number of copies of
your functions that is 
On 6/14/17, 21:19, "AndreaKinn" <kinn6aer@hotmail.it> wrote:

    Hi, this is my project purpose using Kafka and Flink:
    
    <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13743/schema_png.png>

    In kafka topics there are streams representing sensor lectures of different
    subjects. Each topic is reserved for a different sensor.
    Every messages are attached with a key using kafka keyed messages. The key
    represent a subject id and the attached sensor data belong to the
    highlighted subject.
    
    In Flink I want to:
    - Get these streams
    - Separate streams on key (subject) basis in order to build a node chain
    which evaluates always same sensor values of same subjects.
    
    Thanks to you, I have correctly implemented a custom deserializer in order
    to get data and key from Kafka. So now I need to separate streams on key
    basis. 
    As you can see in schema image, in my mind each circle represents a
    different physical machine in a cluster I the deserializer runs over the
    bigger circles which separate streams and deliver them to different smaller
    circles on key basis. 
    
    I read the doc and I think I have to use keyBy() operator on DataStream in
    order to obtain a KeyedStream. 
    It carry me to my first question:
    - I tried to print datastream and keyedstream.
    The former give me this:
    
    <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13743/nokey.png>

    
    while the latter give me this:
    
    <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n13743/withkey.png>

    
    What do the numbers before the record string means (the '3' in the latter
    case)? 
    
    
    Then:
    - How can I 'deliver' the streams in following nodes (smaller circles) on
    key basis?
    
    Now I'm developing on a single machine just to try and learn but also I'm a
    bit confused about how to develop it on cluster.
    
    
    
    
    
    --
    View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-divide-streams-on-key-basis-and-deliver-them-tp13743.html
    Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
    

Mime
View raw message