flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5332) Non-thread safe FileSystem::initOutPathLocalFS() can cause lost files/directories in local execution
Date Tue, 13 Dec 2016 20:06:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746126#comment-15746126
] 

ASF GitHub Bot commented on FLINK-5332:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2999
  
    I think this should go into 1.2 - it is quite a bug for local testing.


> Non-thread safe FileSystem::initOutPathLocalFS() can cause lost files/directories in
local execution
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5332
>                 URL: https://issues.apache.org/jira/browse/FLINK-5332
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> This is mainly relevant to tests and Local Mini Cluster executions.
> The {{FileOutputFormat}} and its subclasses rely on {{FileSystem::initOutPathLocalFS()}}
to prepare the output directory. When multiple parallel output writers call that method, there
is a slim chance that one parallel threads deletes the others directory. The checks that the
method has are not bullet proof.
> I believe that this is the cause for many Travis test instabilities that we observed
over time.
> Simply synchronizing that method per process should do the trick. Since it is a rare
initialization method, and only relevant in tests & local mini cluster executions, it
should be a price that is okay to pay. I see no other way, as we do not have simple access
to an atomic "check and delete and recreate" file operation.
> The synchronization also makes many "re-try" code paths obsolete (there should be no
re-tries needed on proper file systems).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message