spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Rodríguez Hortalá (JIRA) <j...@apache.org>
Subject [jira] [Created] (SPARK-6714) additionally overload KafkaUtils.createDirectStream for using a messageHandler without having to specify the offsets
Date Sun, 05 Apr 2015 10:28:33 GMT
Juan Rodríguez Hortalá created SPARK-6714:
---------------------------------------------

             Summary: additionally overload KafkaUtils.createDirectStream for using a messageHandler
without having to specify the offsets
                 Key: SPARK-6714
                 URL: https://issues.apache.org/jira/browse/SPARK-6714
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.3.0
            Reporter: Juan Rodríguez Hortalá
            Priority: Trivial


Currently, in the Scala API, KafkaUtils.createDirectStream has two overload methods, one for
an "easy mode" where only the Kafka parameters and topics are specified, and other "hard mode"
where we can specify the offsets and a messageHandler for manipulating the MessageAndMetadata
values obtained from Kafka. I think an intermediate method that automatically handles the
offsets, but that allows you to specify the messageHandler would be very useful. For example
the triple (topic, partition, offset) uniquely identifies each message, so that could be useful
as primary key for idempotent insertions in an external database. Also, in an implementation
of a lambda architecture, offsets could be used to trace which part of a kafka topic has been
covered by the batch view, and which part by the real-time / live view. For both cases I think
that having access to the meta information through a messageHandler, while maintaining managed
offsets, would be interesting

A proposal for the implementation is available at https://github.com/juanrh/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala.
A new overload of KafkaUtils.createDirectStream for the Scala API, and another for the Java
API, have been added



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message