spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23827) StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
Date Thu, 29 Mar 2018 22:32:00 GMT
Tathagata Das created SPARK-23827:
-------------------------------------

             Summary: StreamingJoinExec should ensure that input data is partitioned into
specific number of partitions
                 Key: SPARK-23827
                 URL: https://issues.apache.org/jira/browse/SPARK-23827
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.3.0
            Reporter: Tathagata Das
            Assignee: Tathagata Das


Currently, the requiredChildDistribution does not specify the partitions. This can cause the
weird corner cases where the child's distribution is `SinglePartition` which satisfies the
required distribution of `ClusterDistribution(no-num-partition-requirement)`, thus eliminating
the shuffle needed to repartition input data into the required number of partitions (i.e.
same as state stores). That can lead to "file not found" errors on the state store delta files
as the micro-batch-with-no-shuffle will not run certain tasks and therefore not generate the
expected state store delta files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message