spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible
Date Thu, 17 Aug 2017 19:42:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131190#comment-16131190
] 

Steve Loughran edited comment on SPARK-21762 at 8/17/17 7:41 PM:
-----------------------------------------------------------------

SPARK-21669 simplifies this, especially testing, as it's isolated from FileFormatWriter. Same
problem exists though: if you are getting any Create inconsistency, metrics probes trigger
failures which may not be present by the time task commit actually takes place


was (Author: stevel@apache.org):
SPARK-20703 simplifies this, especially testing, as it's isolated from FileFormatWriter. Same
problem exists though: if you are getting any Create inconsistency, metrics probes trigger
failures which may not be present by the time task commit actually takes place

> FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't
yet visible
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21762
>                 URL: https://issues.apache.org/jira/browse/SPARK-21762
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: object stores without complete creation consistency (this includes
AWS S3's caching of negative GET results)
>            Reporter: Steve Loughran
>            Priority: Minor
>
> The metrics collection of SPARK-20703 can trigger premature failure if the newly written
object isn't actually visible yet, that is if, after {{writer.close()}}, a {{getFileStatus(path)}}
returns a {{FileNotFoundException}}.
> Strictly speaking, not having a file immediately visible goes against the fundamental
expectations of the Hadoop FS APIs, namely full consistent data & medata across all operations,
with immediate global visibility of all changes. However, not all object stores make that
guarantee, be it only newly created data or updated blobs. And so spurious FNFEs can get raised,
ones which *should* have gone away by the time the actual task is committed. Or if they haven't,
the job is in such deep trouble.
> What to do?
> # leave as is: fail fast & so catch blobstores/blobstore clients which don't behave
as required. One issue here: will that trigger retries, what happens there, etc, etc.
> # Swallow the FNFE and hope the file is observable later.
> # Swallow all IOEs and hope that whatever problem the FS has is transient.
> Options 2 & 3 aren't going to collect metrics in the event of a FNFE, or at least,
not the counter of bytes written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message