flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakes John <jakesjohn12...@gmail.com>
Subject Flink streaming Parallelism
Date Tue, 08 Aug 2017 00:58:23 GMT
       I am coming from Apache Storm world.  I am planning to switch from
storm to flink. I was reading Flink documentation but, I couldn't find some
requirements in Flink which was present in Storm.

I need to have a streaming pipeline  Kafka->flink-> ElasticSearch.  In
storm,  I have seen that I can specify number of tasks per bolt.  Typically
databases are slow in writes and hence I need more writers to the
database.  Reading from kafka is pretty fast when compared to ES writes.
This means that I need to have more ES writer tasks than Kafka consumers.
How can I achieve it in Flink?  What are the concepts in Flink similar to
Storm Parallelism concepts like workers, executors, tasks?
        I saw the implementation of elasticsearch sink in Flink which can
do batching of messsges before writes. How can I batch data based on a
custom logic? For eg: batch writes  grouped on one of the message keys.
This is possible in Storm via FieldGrouping. But I couldn't find an
equivalent way to do grouping in Flink and control the overall number of
writes to ES.

Please help me with above questions and some pointers to flink parallelism.

View raw message