spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23966) Refactoring all checkpoint file writing logic in a common interface
Date Fri, 13 Apr 2018 11:27:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437201#comment-16437201
] 

Steve Loughran commented on SPARK-23966:
----------------------------------------

w.r.t FileContext.rename vs FileSystem.rename(), they are both *meant* to be atomic, and they
are on: HDFS, local, posix-compliant DFS's. Whether any object store implements atomic and/or
O(1) rename is always ambigiuous: depends on the store and even the path under the store (e.g
wasb & locking-based exclusivity).

I would embrace FileContext for its better failure reporting of rename problems, but don't
expect anything better atomically.  For object stores, the strategy of "write in place" is
better. Of course, now you are left with the problem of "when to know what to use". This plugin
mech handles that, and when some variant of HADOOP-9565 gets in, there'll be a probe for the
semantics of an FS path which could be use by some adaptive connector.

> Refactoring all checkpoint file writing logic in a common interface
> -------------------------------------------------------------------
>
>                 Key: SPARK-23966
>                 URL: https://issues.apache.org/jira/browse/SPARK-23966
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>
> Checkpoint files (offset log files, state store files) in Structured Streaming must be
written atomically such that no partial files are generated (would break fault-tolerance guarantees).
Currently, there are 3 locations which try to do this individually, and in some cases, incorrectly.
>  # HDFSOffsetMetadataLog - This uses a FileManager interface to use any implementation
of `FileSystem` or `FileContext` APIs. It preferably loads `FileContext` implementation as
FileContext of HDFS has atomic renames.
>  # HDFSBackedStateStore (aka in-memory state store)
>  ## Writing a version.delta file - This uses FileSystem APIs only to perform a rename.
This is incorrect as rename is not atomic in HDFS FileSystem implementation.
>  ## Writing a snapshot file - Same as above.
> Current problems:
>  # State Store behavior is incorrect - 
>  # Inflexible - Some file systems provide mechanisms other than write-to-temp-file-and-rename
for writing atomically and more efficiently. For example, with S3 you can write directly to
the final file and it will be made visible only when the entire file is written and closed
correctly. Any failure can be made to terminate the writing without making any partial files
visible in S3. The current code does not abstract out this mechanism enough that it can be
customized. 
> Solution:
>  # Introduce a common interface that all 3 cases above can use to write checkpoint files
atomically. 
>  # This interface must provide the necessary interfaces that allow customization of the
write-and-rename mechanism.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message