flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chesnay Schepler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
Date Mon, 19 Jan 2015 09:52:35 GMT
Chesnay Schepler created FLINK-1419:

             Summary: DistributedCache doesn't preserver files for subsequent operations
                 Key: FLINK-1419
                 URL: https://issues.apache.org/jira/browse/FLINK-1419
             Project: Flink
          Issue Type: Bug
    Affects Versions: 0.8, 0.9
            Reporter: Chesnay Schepler

When subsequent operations want to access the same files in the DC it frequently happens that
the files are not created for the following operation.

This is fairly odd, since the DC is supposed to either a) preserve files when another operation
kicks in within a certain time window, or b) just recreate the deleted files. Both things
don't happen.
Increasing the time window had no effect.

I'd like to use this issue as a starting point for a more general discussion about the DistributedCache.

1. all files reside in a common job-specific directory
2. are deleted during the job.
One thing that was brought up about Trait 1 is that it basically forbids modification of the
files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to
a task-specific place solved the issue though.

I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with
the scheduler, which adds a lot of complexity to the current code. (It really is a pain to
work on...) 
If we moved the deletion to the end of the job it could be done as a clean-up step in the
TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete
method in the TM, and throw out everything else.
Also, the current implementation implies that big files may be copied multiple times. This
may be undesired, depending on how big the files are.

This message was sent by Atlassian JIRA

View raw message