beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1190) FileBasedSource should ignore files that matched the glob but don't exist
Date Wed, 21 Dec 2016 18:45:58 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767818#comment-15767818
] 

Daniel Halperin commented on BEAM-1190:
---------------------------------------

Not for very long -- the stat at open-time is getting removed as we get the information we
need from the list call, but throw it away like we shouldn't be.

How would you feel about the ability to execute code in the worker when the glob is expanded.
I think checking which files actually exist then and deciding in one centralized place in
time which files you want to read (and committing to that decision for later) is probably
a simpler and safer solution.

> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
>                 Key: BEAM-1190
>                 URL: https://issues.apache.org/jira/browse/BEAM-1190
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat every file
and remove those that don't exist, to account for the possibility that glob yielded non-existing
files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message