beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2768) Fix bigquery.WriteTables generating non-unique job identifiers
Date Tue, 15 Aug 2017 18:36:00 GMT


Eugene Kirpichov commented on BEAM-2768:

Could you tell more about how you're using BigQueryIO.Write (it has many modes - it would
be best if you could show a code snippet where you're applying BigQueryIO.write() in your
pipeline, removing all personal data but at least exactly showing all BigQueryIO API methods
you're using) and what exact version of Beam SDK you're using? Your links point to the master
branch, but the bug description says 2.0.0 - these versions have very different implementations
of BigQueryIO.Write.

Looking at the current code, the job id *does* contain a random UUID that comes from

> Fix bigquery.WriteTables generating non-unique job identifiers
> --------------------------------------------------------------
>                 Key: BEAM-2768
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>    Affects Versions: 2.0.0
>            Reporter: Matti Remes
>            Assignee: Reuven Lax
> This is a result of BigQueryIO not creating unique job ids for batch inserts, thus BigQuery
API responding with a 409 conflict error:
> {code:java}
> Request failed with code 409, will NOT retry:<project_id>/jobs
> {code}
> The jobs are initiated in a step BatchLoads/SinglePartitionWriteTables, called by step's
WriteTables ParDo:
> It would probably be a good idea to append a UUIDs as part of a job id.
> Edit: This is a major bug blocking using BigQuery as a sink for bounded input.

This message was sent by Atlassian JIRA

View raw message