spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yue long (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21836) [STREAMING] Retry when kafka broker is down in kafka-streaming-0-8
Date Fri, 25 Aug 2017 08:44:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

yue long updated SPARK-21836:
-----------------------------
    Description: 
When using the package spark-streaming-kafka-0-8 for accessing kafka in spark dstream, many
user will face the "could not find leader" exception if some of kafka brokers are down. This
will cause the whole streaming fail, like  [SPARK-18983|https://issues.apache.org/jira/browse/SPARK-18983]
said. The failed kafka brokers may also cause other problems when creating Dstream or creating
the batch job.

Even though the down of kafka broker is not the bug of spark streaming, we can avoid this
failure in spark streaming.  Especially for the reason that kakfa cluster is not always stable
in the real production. 
Actually, our streaming may take a few minutes to re-submit it but the kafka cluster will
only take a few seconds to replace the failed broker by an alive one!

Does anyoner think we should add some retry logic when kakfa broker is down?  I have implement
this function in spark 1.6.3 and spark 2.1.0, and test them. If we implement this function,
it will reduce the failure number of kafka-streaming which may help streaming users.

  was:
When using the package spark-streaming-kafka-0-8 for accessing kafka in spark dstream, many
user will face the "could not find leader" exception if some of kafka brokers are down. This
will cause the whole streaming fail, like  [SPARK-18983|https://issues.apache.org/jira/browse/SPARK-18983]
said. The failed kafka brokers may also cause other problems when creating Dstream or creating
the batch job.

Even though the down of kafka broker is not the bug of spark streaming, we can avoid this
failure in spark streaming.  Especially for the reasons that kakfa cluster is not always stable
in the real production and it will use other broker to substitute the failed broker in a very
short time. If our streaming fails instantly when one kafka broker is down, it may take many
efforts to re-start it.

Does anyoner think we should add some retry logic when kakfa broker is down?  I have implement
this function in spark 1.6.3 and spark 2.1.0, and test them. If we implement this function,
it will reduce the failure number of kafka-streaming which may help streaming users.


> [STREAMING] Retry when kafka broker is down in kafka-streaming-0-8
> ------------------------------------------------------------------
>
>                 Key: SPARK-21836
>                 URL: https://issues.apache.org/jira/browse/SPARK-21836
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>    Affects Versions: 1.6.3, 2.1.0
>            Reporter: yue long
>
> When using the package spark-streaming-kafka-0-8 for accessing kafka in spark dstream,
many user will face the "could not find leader" exception if some of kafka brokers are down.
This will cause the whole streaming fail, like  [SPARK-18983|https://issues.apache.org/jira/browse/SPARK-18983]
said. The failed kafka brokers may also cause other problems when creating Dstream or creating
the batch job.
> Even though the down of kafka broker is not the bug of spark streaming, we can avoid
this failure in spark streaming.  Especially for the reason that kakfa cluster is not always
stable in the real production. 
> Actually, our streaming may take a few minutes to re-submit it but the kafka cluster
will only take a few seconds to replace the failed broker by an alive one!
> Does anyoner think we should add some retry logic when kakfa broker is down?  I have
implement this function in spark 1.6.3 and spark 2.1.0, and test them. If we implement this
function, it will reduce the failure number of kafka-streaming which may help streaming users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message