spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hustfxj (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-18707) Can spark support exactly once based kafka ? Due to these following question?
Date Sun, 04 Dec 2016 15:04:58 GMT
hustfxj created SPARK-18707:
-------------------------------

             Summary: Can spark  support exactly once based kafka ? Due to these following
question?
                 Key: SPARK-18707
                 URL: https://issues.apache.org/jira/browse/SPARK-18707
             Project: Spark
          Issue Type: Question
            Reporter: hustfxj


1. If a task complete the operation, it will notify driver. The driver may not receive the
message due to the network, and think the task is still running. Then the child stage won't
be scheduled ?
2. how do spark guarantee the downstream-task  can receive the shuffle-data completely. As
fact, I can't find the checksum for blocks in spark. For example, the upstream-task may shuffle
100Mb data, but the downstream-task may receive 99Mb data due to network. Can spark verify
the data is received completely based size ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message