spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: How to separate messages of different topics.
Date Tue, 05 May 2015 13:50:32 GMT
Make sure to read
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md

The directStream / KafkaRDD has a 1 : 1 relationship between kafka
topic/partition and spark partition.  So a given spark partition only has
messages from 1 kafka topic.  You can tell what topic that is using
HasOffsetRanges, as discussed in the post.

This 1 : 1 relationship only holds until the first transformation that
incurs a shuffle.

On Tue, May 5, 2015 at 8:29 AM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:

> I want to read from many topics in Kafka and know from where each message
> is coming (topic1, topic2 and so on).
>
>  val kafkaParams = Map[String, String]("metadata.broker.list" ->
> "myKafka:9092")
>  val topics = Set("EntryLog", "presOpManager")
>  val directKafkaStream = KafkaUtils.createDirectStream[String, String,
> StringDecoder, StringDecoder](ssc, kafkaParams, topics)
>
>  Is there some way to separate the messages for topics with just one
> directStream? or should I create different streamings for each topic?
>

Mime
View raw message