beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pei He (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1252) BigQueryIO.Read: validate exported files with GCS glob.
Date Tue, 10 Jan 2017 00:57:58 GMT
Pei He created BEAM-1252:
----------------------------

             Summary: BigQueryIO.Read: validate exported files with GCS glob.
                 Key: BEAM-1252
                 URL: https://issues.apache.org/jira/browse/BEAM-1252
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-gcp
            Reporter: Pei He
            Assignee: Pei He


BigQuery has started creating user-visible temp files that we notice and start reading from,
but then they get moved. It could cause job failures and data duplication.

On Beam side, we can have stronger validation:
1. When listing files, validate that they match the expected URI.
2. When BQ has finished job, integrity check to verify that # files read from == # files BQ
claims exist.
3. If possible, add a prefix to the filename of the glob (*.avro to step*.avro). Step name?
Other? This might be as easy as dropping a '/' in the middle of the path. A la #7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message