spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tian zhang <>
Subject 2 spark streaming questions
Date Mon, 24 Nov 2014 05:31:28 GMT

Hi, Dear Spark Streaming Developers and Users,
We are prototyping using spark streaming and hit the following 2 issues thatI would like to
seek your expertise.
1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream,
does some processing and output a transformed DStream. If for some reasonthe Kafka connection
is not available or timed out, the spark streaming job will startto send empty RDD afterwards.
The log is clean w/o any ERROR indicator. I googled  around and this seems to be a known
issue.We believe that spark streaming infrastructure should either retry or return error/exception.Can
you share how you handle this case?
2) We would like implement a spark streaming job that join an 1 minute  duration DStream
of real time eventswith a metadata RDD that was read from a database. The metadata only changes
slightly each day in the database.So what is the best practice of refresh the RDD daily keep
the streaming join job running? Is this do-able as of spark 1.1.0?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message