beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Jon Anderson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-2870) BQ Partitioned Table Write Fails When Destination has Partition Decorator
Date Sat, 09 Sep 2017 07:11:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steven Jon Anderson updated BEAM-2870:
--------------------------------------
    Description: 
Dataflow Job ID: https://console.cloud.google.com/dataflow/job/2017-09-08_23_03_14-14637186041605198816?project=firebase-lessthan3

Tagging [~reuvenlax] as I believe he built the time partitioning integration that was merged
into master.

*Background*
Our production pipeline ingests millions of events per day and routes events into our clients'
numerous tables. To keep costs down, all of our tables are partitioned. However, this requires
that we create the tables before we allow events to process as creating partitioned tables
isn't supported in 2.1.0. We've been looking forward to [~reuvenlax]'s partition table write
feature ([#3663|https://github.com/apache/beam/pull/3663]) to get merged into master for some
time now as it'll allow us to launch our client platforms much, much faster. Today we got
around to testing the 2.2.0 nightly and discovered this bug.

*Issue*
Our pipeline writes to a table with a decorator. When attempting to write to an existing partitioned
table with a decorator, the write succeeds. When using a partitioned table destination that
doesn't exist without a decorator, the write succeeds. *However, when writing to a partitioned
table that doesn't exist with a decorator, the write fails*. 

*Example Implementation*
{code:java}
BigQueryIO.writeTableRows()
  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
  .withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
  .to(new DynamicDestinations<TableRow, String>() {

    @Override
    public String getDestination(ValueInSingleWindow<TableRow> element) {
      return "PROJECT_ID:DATASET_ID.TABLE_ID$20170902";
    }

    @Override
    public TableDestination getTable(String destination) {
      TimePartitioning DAY_PARTITION = new TimePartitioning().setType("DAY");
      return new TableDestination(destination, null, DAY_PARTITION);
    }

    @Override
    public TableSchema getSchema(String destination) {
      return TABLE_SCHEMA;
    }
  })
{code}

*Relevant Logs & Errors in StackDriver*

{code:none}
23:06:26.790 
Trying to create BigQuery table: PROJECT_ID:DATASET_ID.TABLE_ID$20170902

23:06:26.873 
Invalid table ID \"TABLE_ID$20170902\". Table IDs must be alphanumeric (plus underscores)
and must be at most 1024 characters long. Also, Table decorators cannot be used.
{code}

  was:
Dataflow Job ID: https://console.cloud.google.com/dataflow/job/2017-09-08_23_03_14-14637186041605198816?project=firebase-lessthan3

Tagging [~reuvenlax] as I believe he built the time partitioning integration that was merged
into master.

*Background*
Our production pipeline ingests millions of events per day and routes events into our clients'
numerous tables. To keep costs down, all of our tables are partitioned. However, this requires
that we create the tables before we allow events to process as creating partitioned tables
isn't supported in 2.1.0. We've been looking forward to [~reuvenlax]'s partition table write
feature ([#3663|https://github.com/apache/beam/pull/3663]) to get merged into master for some
time now as it'll allow us to launch our client platforms much, much faster. Today we got
around to testing the 2.2.0 nightly and discovered this bug.

*Issue*
Our pipeline that writes to a table with a decorator. When attempting to write to an existing
partitioned table with a decorator, the write succeeds. When using a partitioned table destination
that doesn't exist without a decorator, the write succeeds. *However, when writing to a partitioned
table that doesn't exist with a decorator, the write fails*. 

*Example Implementation*
{code:java}
BigQueryIO.writeTableRows()
  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
  .withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
  .to(new DynamicDestinations<TableRow, String>() {

    @Override
    public String getDestination(ValueInSingleWindow<TableRow> element) {
      return "PROJECT_ID:DATASET_ID.TABLE_ID$20170902";
    }

    @Override
    public TableDestination getTable(String destination) {
      TimePartitioning DAY_PARTITION = new TimePartitioning().setType("DAY");
      return new TableDestination(destination, null, DAY_PARTITION);
    }

    @Override
    public TableSchema getSchema(String destination) {
      return TABLE_SCHEMA;
    }
  })
{code}

*Relevant Logs & Errors in StackDriver*

{code:none}
23:06:26.790 
Trying to create BigQuery table: PROJECT_ID:DATASET_ID.TABLE_ID$20170902

23:06:26.873 
Invalid table ID \"TABLE_ID$20170902\". Table IDs must be alphanumeric (plus underscores)
and must be at most 1024 characters long. Also, Table decorators cannot be used.
{code}


> BQ Partitioned Table Write Fails When Destination has Partition Decorator
> -------------------------------------------------------------------------
>
>                 Key: BEAM-2870
>                 URL: https://issues.apache.org/jira/browse/BEAM-2870
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.2.0
>         Environment: Dataflow Runner, Streaming, 10 x (n1-highmem-8 & 500gb SDD)
>            Reporter: Steven Jon Anderson
>            Assignee: Thomas Groh
>              Labels: bigquery, dataflow, google, google-cloud-bigquery, google-dataflow
>             Fix For: 2.2.0
>
>
> Dataflow Job ID: https://console.cloud.google.com/dataflow/job/2017-09-08_23_03_14-14637186041605198816?project=firebase-lessthan3
> Tagging [~reuvenlax] as I believe he built the time partitioning integration that was
merged into master.
> *Background*
> Our production pipeline ingests millions of events per day and routes events into our
clients' numerous tables. To keep costs down, all of our tables are partitioned. However,
this requires that we create the tables before we allow events to process as creating partitioned
tables isn't supported in 2.1.0. We've been looking forward to [~reuvenlax]'s partition table
write feature ([#3663|https://github.com/apache/beam/pull/3663]) to get merged into master
for some time now as it'll allow us to launch our client platforms much, much faster. Today
we got around to testing the 2.2.0 nightly and discovered this bug.
> *Issue*
> Our pipeline writes to a table with a decorator. When attempting to write to an existing
partitioned table with a decorator, the write succeeds. When using a partitioned table destination
that doesn't exist without a decorator, the write succeeds. *However, when writing to a partitioned
table that doesn't exist with a decorator, the write fails*. 
> *Example Implementation*
> {code:java}
> BigQueryIO.writeTableRows()
>   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>   .withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
>   .to(new DynamicDestinations<TableRow, String>() {
>     @Override
>     public String getDestination(ValueInSingleWindow<TableRow> element) {
>       return "PROJECT_ID:DATASET_ID.TABLE_ID$20170902";
>     }
>     @Override
>     public TableDestination getTable(String destination) {
>       TimePartitioning DAY_PARTITION = new TimePartitioning().setType("DAY");
>       return new TableDestination(destination, null, DAY_PARTITION);
>     }
>     @Override
>     public TableSchema getSchema(String destination) {
>       return TABLE_SCHEMA;
>     }
>   })
> {code}
> *Relevant Logs & Errors in StackDriver*
> {code:none}
> 23:06:26.790 
> Trying to create BigQuery table: PROJECT_ID:DATASET_ID.TABLE_ID$20170902
> 23:06:26.873 
> Invalid table ID \"TABLE_ID$20170902\". Table IDs must be alphanumeric (plus underscores)
and must be at most 1024 characters long. Also, Table decorators cannot be used.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message