spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amit-ramesh <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...
Date Tue, 07 Jul 2015 23:16:27 GMT
Github user amit-ramesh commented on the pull request:

    https://github.com/apache/spark/pull/7185#issuecomment-119372136
  
    @jerryshao @tdas 
    Have a few points from a user perspective regarding the DStream version:
    
    1. Based on this PR it looks like foreachRDD is the only way to get offsets in the DStream
case. This would mean that the data needs to be sent over to the driver in order to obtain
offsets. Is it possible to obtain offsets in the workers right after receiving the data from
Kafka?
    
    2. We have state in updateStateByKey() that necessitates attaching corresponding Kafka
metadata to every event in the DStream in order to be able to reconstruct the state across
deployments. Is there a way to attach the Kafka offset to every event using the spark API?
Essentially obtaining something akin to the output that juanrh had originally proposed in
SPARK-8337.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message