beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <>
Subject [jira] [Assigned] (BEAM-1822) Improve handling of eventually-consistent filepatterns
Date Tue, 28 Mar 2017 22:53:41 GMT


Daniel Halperin reassigned BEAM-1822:

    Assignee:     (was: Daniel Halperin)

> Improve handling of eventually-consistent filepatterns
> ------------------------------------------------------
>                 Key: BEAM-1822
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
> Reading from an eventually consistent filepattern (e.g. located in a multi-regional Google
Cloud Storage bucket, etc.) using FileBasedSource is dangerous, because it may silently process
fewer data than the user thinks, in case not all files get returned by the match call.
> We should improve our handling of this case. I'd suggest to aim for minimizing the chance
of silent data loss. Here's a couple of things we could do.
> - Let the user supply an expected number of files to be matched, and fail the pipeline
if the actual number is different. For special filepatterns like XXX-of-YYY, we can autodetect
the expected number.
> - Poll the filepattern for a while (perhaps for a period determined by the underlying
IOChannelFactory that knows the typical eventual consistency convergence times of its filesystem),
and either wait until it quiesces, or fail the pipeline if it doesn't

This message was sent by Atlassian JIRA

View raw message