flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carst Tankink <ctank...@bol.com>
Subject Re: kafka consumer parallelism
Date Tue, 03 Oct 2017 08:30:11 GMT
(Accidentally sent this to Timo instead of to-list...)

Hi,

What Timo says is true, but in case you have a higher parallism than the number of partitions
(because you want to make use of it in a future operator), you could do a .rebalance() (see
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#physical-partitioning)
after the Kafka source.
This makes sure that all operators after the Kafka source get an even load, at the cost of
having to redistribute the documents (so there is de/serialization + network overhead).


Carst

On 10/3/17, 09:34, "Sofer, Tovi " <tovi.sofer@citi.com> wrote:

    Hi Robert,
    
    I had similar issue.
    For me the problem was that the topic was auto created with one partition.
    You can alter it to have 5 partitions using kafka-topics  command.
    Example: 
    kafka-topics --alter  --partitions 5 --topic fix --zookeeper localhost:2181 
    
    Regards,
    Tovi
    -----Original Message-----
    From: Timo Walther [mailto:twalthr@apache.org] 
    Sent: יום ב 02 אוקטובר 2017 20:59
    To: user@flink.apache.org
    Subject: Re: kafka consumer parallelism
    
    Hi,
    
    I'm not a Kafka expert but I think you need to have more than 1 Kafka partition to process
multiple documents at the same time. Make also sure to send the documents to different partitions.
    
    Regards,
    Timo
    
    
    Am 10/2/17 um 6:46 PM schrieb r. r.:
    > Hello
    > I'm running a job with "flink run -p5" and additionally set env.setParallelism(5).
    > The source of the stream is Kafka, the job uses FlinkKafkaConsumer010.
    > In Flink UI though I notice that if I send 3 documents to Kafka, only one 'instance'
of the consumer seems to receive Kafka's record and send them to next operators, which according
to Flink UI are properly parallelized.
    > What's the explanation of this behavior?
    > According to sources:
    >
    > To enable parallel execution, the user defined source should
    >       * implement {@link 
    > org.apache.flink.streaming.api.functions.source.ParallelSourceFunction
    > } or extend {@link
    >       * 
    > org.apache.flink.streaming.api.functions.source.RichParallelSourceFunc
    > tion}
    > which FlinkKafkaConsumer010 does
    >
    > Please check a screenshot at 
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__imgur.com_a_E1H9r
    > &d=DwIDaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=bfLStYBPfgr58eRbGoW11gp3x4kr3rJ99
    > _MiSMX5oOs&m=LiwKApZmqwYYsiKqby4Ugd5WJgyPKpj3H7s9l7Xw_Qg&s=ti6cswIJ4X9
    > d5wgGkq5EUx41y4WXZ_z_HebkoOrLEmw&e=   you'll see that only one sends 3 
    > records to the sinks
    >
    > My code is here: 
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__pastebin.com_yjYC
    > XAAR&d=DwIDaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=bfLStYBPfgr58eRbGoW11gp3x4kr3
    > rJ99_MiSMX5oOs&m=LiwKApZmqwYYsiKqby4Ugd5WJgyPKpj3H7s9l7Xw_Qg&s=AApHKm3
    > amPLzWwAqk2KITEeUkhNE0GS1Oo02jaUpKIw&e=
    >
    > Thanks!
    
    
    

Mime
View raw message