flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Peng <jerry.boyang.p...@gmail.com>
Subject Question regarding parallelism
Date Wed, 21 Oct 2015 21:11:08 GMT
Hello,

I have flink streaming job as follows

DataStream<String> messageStream = env
        .addSource(new FlinkKafkaConsumer082(
                flinkParams.getRequired("topic"),
                new SimpleStringSchema(),
                flinkParams.getProperties())).setParallelism(5);

DataStream<Tuple7<String, String, String, String, String, String,
String>> messageStream2 = messageStream
        .rebalance()
        .flatMap(new Operator1())
        .setParallelism(10);

DataStream<Tuple7<String, String, String, String, String, String,
String>> messageStream3 = messageStream2
        .rebalance()
        .filter(new Operator2())
        .setParallelism(20);

DataStream<Tuple2<String, String>> messageStream4 =
messageStream3.<Tuple2<String, String>>project(2,
5).setParallelism(20);

DataStream<Tuple3<String, String, String>> messageStream5 = messageStream4
        .flatMap(new Operator3())
        .setParallelism(20)
        .groupBy(0);

messageStream5
        .flatMap(new Operator4())
        .setParallelism(20);

env.execute();


When I submit the job, the number of task slots that gets used
(displayed on the UI) is only 20.  Why is that? The total number of
tasks listed on the ui is 55.  And also why does the
filter->project->flatmap get compress into one operator with a
parallelism of 20?  Can I not set the individual operators (i.e.
filter and project) to have an individual parallelism of 20?

Thanks you the help!

Best,

Jerry

Mime
View raw message