beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reuven Lax (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3200) Streaming Pipeline throws RuntimeException when using DynamicDestinations and Method.FILE_LOADS
Date Sat, 18 Nov 2017 01:49:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257862#comment-16257862
] 

Reuven Lax commented on BEAM-3200:
----------------------------------

We shuffle with Destination as they key before calling WriteTables. This means that each destination
should have it's own independent trigger index, as triggers are per key.

> Streaming Pipeline throws RuntimeException when using DynamicDestinations and Method.FILE_LOADS
> -----------------------------------------------------------------------------------------------
>
>                 Key: BEAM-3200
>                 URL: https://issues.apache.org/jira/browse/BEAM-3200
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.2.0
>            Reporter: AJ
>            Assignee: Chamikara Jayalath
>            Priority: Critical
>
> I am trying to use Method.FILE_LOADS for loading data into BQ in my streaming pipeline
using RC3 release of 2.2.0. I am writing to around 500 tables using DynamicDestinations and
I am also using withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED). Everything works
fine when the first time bigquery load jobs get triggered. But on subsequent triggers pipeline
throws a RuntimeException about table not found even though I created the pipeline with CreateDisposition.CREATE_IF_NEEDED.
The exact exception is:
> {code}
> java.lang.RuntimeException: Failed to create load job with id prefix 717aed9ed1ef4aa7a616e1132f8b7f6d_a0928cae3d670b32f01ab2d9fe5cc0ee_00001_00001,
reached max retries: 3, last failed load job: {
>   "configuration" : {
>     "load" : {
>       "createDisposition" : "CREATE_NEVER",
>       "destinationTable" : {
>         "datasetId" : ...,
>         "projectId" : ...,
>         "tableId" : ....
>       },
>     "errors" : [ }
>       "message" : "Not found: Table ....,
>       "reason" : "notFound"
>     } ],
> {code}
> My theory is all the subsequent load jobs get trigged using CREATE_NEVER disposition
and 
> this might be due to https://github.com/apache/beam/blob/release-2.2.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L140
> When using DynamicDestinations all the destination tables might not be known during the
first trigger and hence the pipeline's create disposition should be respected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message