beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2122) Writing to partitioned BigQuery tables from Dataflow is causing errors
Date Mon, 08 May 2017 16:12:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001005#comment-16001005
] 

ASF GitHub Bot commented on BEAM-2122:
--------------------------------------

GitHub user reuvenlax opened a pull request:

    https://github.com/apache/beam/pull/2953

    BEAM-2122] Allow table descriptions to be null

    Wrap the coder with a NullableCoder.
    
    R: @jkff 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/reuvenlax/incubator-beam allow_null_table_description

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2953
    
----
commit df8cf750e62846531b0b0260e4c84d3bb6b8d2c7
Author: Reuven Lax <relax@google.com>
Date:   2017-05-08T16:06:55Z

    TableDescription is allowed to be null.

----


> Writing to partitioned BigQuery tables from Dataflow is causing errors
> ----------------------------------------------------------------------
>
>                 Key: BEAM-2122
>                 URL: https://issues.apache.org/jira/browse/BEAM-2122
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>         Environment: Running with Beam 0.7.0-SNAPSHOT version 48 for beam-sdks-java-io-google-cloud-platform,
49 for beam-sdks-java-core and beam-runners-google-cloud-dataflow-java in Eclipse using Dataflow
service.
>            Reporter: Matthias Baetens
>            Assignee: Reuven Lax
>
> Using the latest Beam SNAPSHOT which has a new BigQuery connector and trying to write
to partitioned tables according to the docs (or this Stackoverflow question http://stackoverflow.com/questions/43505534/writing-different-values-to-different-bigquery-tables-in-apache-beam/43655461#43655461):
> 	static class PartitionedTableGeneration
> 			implements SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>
{
> 		@ProcessElement
> 		public TableDestination apply(ValueInSingleWindow<TableRow> value) {
> 			// String dayString =
> 			// DateTimeFormat.forPattern("yyyy_MM_dd").withZone(DateTimeZone.UTC)
> 			String dayString = DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC)
> 					.print(((IntervalWindow) value.getWindow()).start());
> 			TableDestination td = new TableDestination(
> 					"projecet:dataset.table + '$' dayString, "");
> 			return td;
> 		}
> 	}
> causes the following issues when running (depending on the specification of the dayString):
> 1. "Invalid table ID \"partitioned_sample$20150905\". Table IDs must be alphanumeric
(plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot
be used.",
>  2. java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
Failed to create load job with id prefix 
> ...
>     "errorResult" : {
>       "message" : "Invalid date partitioned table suffix: 2015_11_26",
>       "reason" : "invalid"
>     }
> Writing to sharded tables (without the '$'-sign) is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message