spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-24589) OutputCommitCoordinator may allow duplicate commits
Date Fri, 22 Jun 2018 21:09:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin updated SPARK-24589:
-----------------------------------
    Fix Version/s: 2.1.3

> OutputCommitCoordinator may allow duplicate commits
> ---------------------------------------------------
>
>                 Key: SPARK-24589
>                 URL: https://issues.apache.org/jira/browse/SPARK-24589
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.1, 2.3.1
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>            Priority: Blocker
>             Fix For: 2.1.3, 2.2.2, 2.3.2, 2.4.0
>
>
> This is a sibling bug to SPARK-24552. While investigating the source of that bug, it
was found that currently the output committer allows duplicate commits when there are stage
retries, and the task with the task attempt number (one in each stage that currently has running
tasks) try to commit their output.
> This can lead to duplicate data in the output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message