flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Telco Phone <tel...@yahoo.com>
Subject Threading issue
Date Wed, 22 Mar 2017 02:41:15 GMT
I am looking to get  readers from kafka / keyBy and Sink working with all 60 threads.
For the most part it is working correctly
DataStream<SchemaRecord> stream =env.addSource(new FlinkKafkaConsumer08<>("kafkatopic",
schema, properties)).setParallelism(60).flatMap(new SchemaRecordSplit()).setParallelism(60).name("RawAdActivity
splitter").keyBy("partition","threadNumber","schemaId");
stream.addSink(new CustomMaprFsSink()).setParallelism(60).name("RawAdActivity Sink");
However I only get about 24-30 sinks writing data

Now the kafka payload I am reading is based on time / schema so to help out I put in a random
number generator and group by it as well so that it will "try" to force 60 sinks receiving
data and writing to HDFS.
Any thoughts with the above code that I can somehow "help" it make sure that during the hour
for the most part I should have all 60 reading / sorting / Sinking (writing to file system)

Mime
View raw message