Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 82B5D200CF8 for ; Tue, 15 Aug 2017 22:41:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 818D5167505; Tue, 15 Aug 2017 20:41:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D56E8167506 for ; Tue, 15 Aug 2017 22:41:04 +0200 (CEST) Received: (qmail 67339 invoked by uid 500); 15 Aug 2017 20:41:03 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 67329 invoked by uid 99); 15 Aug 2017 20:41:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2017 20:41:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 154501A1776 for ; Tue, 15 Aug 2017 20:41:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ExIZRTSgkhS0 for ; Tue, 15 Aug 2017 20:41:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 422CF5FB76 for ; Tue, 15 Aug 2017 20:41:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 86DA6E041C for ; Tue, 15 Aug 2017 20:41:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 33DE92140B for ; Tue, 15 Aug 2017 20:41:00 +0000 (UTC) Date: Tue, 15 Aug 2017 20:41:00 +0000 (UTC) From: "Matti Remes (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-2768) Fix bigquery.WriteTables generating non-unique job identifiers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 15 Aug 2017 20:41:05 -0000 [ https://issues.apache.org/jira/browse/BEAM-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127852#comment-16127852 ] Matti Remes commented on BEAM-2768: ----------------------------------- {code:java} public static void loadRowsToBigQuery(String name, PCollection rows, DynamicDestinations destination) { rows.apply(name, BigQueryIO.write() .withFormatFunction(new TableRowFormatter()) .to(destination) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)); } public class TableRowFormatter implements SerializableFunction { @Override public TableRow apply(TableRow tableRow) { return tableRow; } } {code} Apologies for the references, yes I was intending to point to the 2.0.0 source (I'm using 2.0.0). The problem might be with the way the UUID is created and stored. Now the code states that the generated UUID "will be used as the base for all load jobs issued from this instance of the transform": https://github.com/apache/beam/blob/v2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L184 I can indeed confirm it from the logs that the job id is the same. > Fix bigquery.WriteTables generating non-unique job identifiers > -------------------------------------------------------------- > > Key: BEAM-2768 > URL: https://issues.apache.org/jira/browse/BEAM-2768 > Project: Beam > Issue Type: Bug > Components: beam-model > Affects Versions: 2.0.0 > Reporter: Matti Remes > Assignee: Reuven Lax > > This is a result of BigQueryIO not creating unique job ids for batch inserts, thus BigQuery API responding with a 409 conflict error: > {code:java} > Request failed with code 409, will NOT retry: https://www.googleapis.com/bigquery/v2/projects//jobs > {code} > The jobs are initiated in a step BatchLoads/SinglePartitionWriteTables, called by step's WriteTables ParDo: > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L511-L521 > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L148 > It would probably be a good idea to append a UUIDs as part of a job id. > Edit: This is a major bug blocking using BigQuery as a sink for bounded input. -- This message was sent by Atlassian JIRA (v6.4.14#64029)