flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
Date Tue, 27 Jan 2015 14:09:34 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293556#comment-14293556
] 

ASF GitHub Bot commented on FLINK-1419:
---------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/339#issuecomment-71652771
  
    I'm wondering whether the count hash map update should rather happen in the copy process.
Because otherwise there could be the following interleaving:
    
    1. You register a new temp file "foobar" for task B --> creating a copy task and increment
file counter
    2. You delete the temp file "foobar" for task A because it is finished --> creating
a delete process with the incremented counter
    3. You execute the copy process
    4. You execute the delete process
    
    Then the file "foobar" does not exist for task B.
    
    Another thing is that the DeleteProcess tries to delete the whole directory below the
jobID if one file shall be deleted. I don't know whether this is the right behaviour.


> DistributedCache doesn't preserver files for subsequent operations
> ------------------------------------------------------------------
>
>                 Key: FLINK-1419
>                 URL: https://issues.apache.org/jira/browse/FLINK-1419
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 0.8, 0.9
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it frequently happens
that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when another
operation kicks in within a certain time window, or b) just recreate the deleted files. Both
things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion about the
DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids modification
of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing
it to a task-specific place solved the issue though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized
with the scheduler, which adds a lot of complexity to the current code. (It really is a pain
to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up step in
the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the
delete method in the TM, and throw out everything else.
> Also, the current implementation implies that big files may be copied multiple times.
This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message