spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23827) StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
Date Thu, 29 Mar 2018 23:27:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419925#comment-16419925
] 

Apache Spark commented on SPARK-23827:
--------------------------------------

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/20941

> StreamingJoinExec should ensure that input data is partitioned into specific number of
partitions
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23827
>                 URL: https://issues.apache.org/jira/browse/SPARK-23827
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Critical
>
> Currently, the requiredChildDistribution does not specify the partitions. This can cause
the weird corner cases where the child's distribution is `SinglePartition` which satisfies
the required distribution of `ClusterDistribution(no-num-partition-requirement)`, thus eliminating
the shuffle needed to repartition input data into the required number of partitions (i.e.
same as state stores). That can lead to "file not found" errors on the state store delta files
as the micro-batch-with-no-shuffle will not run certain tasks and therefore not generate the
expected state store delta files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message