beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1822) Improve handling of eventually-consistent filepatterns
Date Tue, 28 Mar 2017 22:51:41 GMT
Eugene Kirpichov created BEAM-1822:
--------------------------------------

             Summary: Improve handling of eventually-consistent filepatterns
                 Key: BEAM-1822
                 URL: https://issues.apache.org/jira/browse/BEAM-1822
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Daniel Halperin


Reading from an eventually consistent filepattern (e.g. located in a multi-regional Google
Cloud Storage bucket, etc.) using FileBasedSource is dangerous, because it may silently process
fewer data than the user thinks, in case not all files get returned by the match call.

We should improve our handling of this case. I'd suggest to aim for minimizing the chance
of silent data loss. Here's a couple of things we could do.

- Let the user supply an expected number of files to be matched, and fail the pipeline if
the actual number is different. For special filepatterns like XXX-of-YYY, we can autodetect
the expected number.
- Poll the filepattern for a while (perhaps for a period determined by the underlying IOChannelFactory
that knows the typical eventual consistency convergence times of its filesystem), and either
wait until it quiesces, or fail the pipeline if it doesn't



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message