apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Stockton <cstock...@gmail.com>
Subject repartition between operators
Date Sat, 24 Sep 2016 22:40:35 GMT
If I have a KafkaInputOperator that is reading from 2 partitions and I
setup a stream to a one-to-one operator, how would I read the messages from
each Kafka partition, extract a particular field (say user id) and then
send all the keys to a parallel downstream operator so that the keys are
grouped together.

Something like this:

Messages coming in on kafka are json of the format {'user_id': 123,
'value': 'abc'} where the user_id is some number.  The messages are not
partitioned on the Kafka topic according to the user_id.

In coming msgs:            repartition by id

id=1, id=2 ==>  K1 -> JsonToPojo --   --> PoJo(1), PoJo(3), PoJo(1)
                                   \ /
                                    X
                                   / \
id=1, id=4 ==>  K2 -> JsonToPojo --   --> PoJo(4)

Where I transform the Json messages into a POJO like:

class PoJo {
  int id;
  String value;
  ...
}

Would 'X' need to be a combination of a StreamMerger and some kind of
custom partitioner, or could the JsonToPojo operator directly partition
it's output based on the id value?

Mime
View raw message