beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2122) Writing to partitioned BigQuery tables from Dataflow is causing errors
Date Mon, 08 May 2017 22:58:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001727#comment-16001727
] 

ASF GitHub Bot commented on BEAM-2122:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/2969

    Cherrypick #2953 into release branch

    https://github.com/apache/beam/pull/2953 BEAM-2122 Allow table descriptions to be null
    
    R: @davorbonaci 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam cp-2953

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2969
    
----
commit 56262d8f3064031606e01801eb36322cb3a38b95
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-05T22:53:46Z

    Update version number from 0.7.0-SNAPSHOT to 2.0.0-SNAPSHOT

commit 1a77e208e440ef34b7d0f7e1104d5c8c5ee04474
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-05T23:32:54Z

    This cherry-picks #2926

commit b9c8cfe227d7a6bcb258b93717969a78a31dac07
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-06T00:24:12Z

    This closes #2932

commit d2fa51b78892f7ebf13da1a5fc7bb45755440a5f
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-05T23:17:43Z

    Cherry-pick pull request #2911 into release-2.0.0

commit 67ea7ae4d2144525582c7de03b17d06daa9f35bb
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-05T23:20:28Z

    Cherry-pick pull request #2907 into release-2.0.0 branch

commit 96aeb97cc41b4a93bec7d72cff4887e9f358eef2
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-06T00:26:41Z

    This closes #2931

commit 1ad3f84c68235eeae6927f283180829de2f0aa33
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-06T00:36:09Z

    Set Dataflow runner's worker container image for version 2.0.0

commit f97e52b3b677a5a35bac7a2012366837b7bb15cb
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-06T01:46:31Z

    [maven-release-plugin] prepare release v2.0.0-RC1

commit 3b7a62301fba55f3caf68d8876b39c53e80f171a
Author: Davor Bonaci <davor@google.com>
Date:   2017-05-07T02:16:30Z

    [maven-release-plugin] rollback changes from release preparation of v2.0.0-RC1

commit 6eab5c9465bda3da4d8a1ea9f73a74e9c8faec85
Author: chinmaykolhatkar <chinmay@apache.org>
Date:   2017-03-01T11:29:46Z

    [BEAM-831] ParDo Fusion of Apex Runner

commit 3f5282d515fa53516fda6d0376cc912560fd6d85
Author: Thomas Weise <thw@apache.org>
Date:   2017-05-05T13:45:34Z

    [BEAM-831] Fix chaining, add test.
    closes #2216

commit bec30b3beaa241483814e859d781f1e04479394b
Author: Ahmet Altay <altay@google.com>
Date:   2017-05-08T06:31:42Z

    Cherry-pick pull request #2946 in 2.0.0 release branch.
    Fix typo in datastore_wordcount.py.

commit 021468e03e5a5b0851e21f333ebc07060dc471cd
Author: Ahmet Altay <altay@google.com>
Date:   2017-05-08T17:25:40Z

    This closes #2955

commit 72241117cbf2d9682054a69ea895e4c6f6a93146
Author: Sourabh Bajaj <sourabhbajaj@google.com>
Date:   2017-05-07T20:55:20Z

    [BEAM-2206] Move pipelineOptions into options modules

commit c4f234c8cfb349d877eeb5c62eec7d80e844be07
Author: Sourabh Bajaj <sourabhbajaj@google.com>
Date:   2017-05-08T01:08:49Z

    Only cythonize files within apache_beam

commit 741bf7442d88a9e30064bd132046e5db55e7a740
Author: Ahmet Altay <altay@google.com>
Date:   2017-05-08T21:02:54Z

    This closes #2964

commit 25cda3abb0f6442f02ab6f31f0a4850be66d09d9
Author: Sourabh Bajaj <sourabhbajaj@google.com>
Date:   2017-05-06T02:15:48Z

    Update python dataflow worker

commit 265405bc85f9f705776e88680e3af26fab4e7de3
Author: Ahmet Altay <altay@google.com>
Date:   2017-05-08T21:06:17Z

    This closes #2943

commit e0faeeef80211ddbc632e622ecefc1e005c5ca29
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-06T02:06:03Z

    [BEAM-2212] FileBasedSource: improve message when logging.
    
    ValueProvider should not be printed, rather the string instead.

commit bff819a9858c79c6c3232b4c03f262421d325c00
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-08T16:59:16Z

    [BEAM-2212] FileBasedSource: refactor to remove uses of fileOrPatternSpec.get()
    
    Makes it less likely to have errors from printing ValueProviders instead of runtime values

commit 94d104064cc0e209fa54dd63f4cbe99cd6f2d591
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-08T21:15:48Z

    This closes #2958

commit 4ec11de1a8b876a8263c95c21b7ce830fb4e962b
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-06T00:16:34Z

    [BEAM-2190] pom.xml: do a better job of dependency management
    
    Even if Beam appears to have the correct dependencies, we cannot
    guarantee that modules that depend on us transitively get the right
    dependencies. For example, even though grpc-protobuf-lite has
    protobuf-lite excluded, and the Maven Enforcer banned-dependencies
    check passes... if a user happens to get a transitive dependency on
    grpc-all first, they may pull in grpc-protobuf from that other source
    without the exclusion. Thus we need to exclude protobuf-lite from
    grpc-all as well.
    
    While we're here, also add guava-jdk5 to the set of banned dependencies,
    though (as above) we cannot currently properly identify the places it
    might be transitively exposed in a users' pom.xml.

commit 58e2db06144abc463988284c06d553ad85dd43e2
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-08T21:26:13Z

    This closes #2957

commit 469f177a52664ab4389f8929eb52f2ed04529ec9
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-06T02:49:29Z

    Convert Executor Services to use Daemon Threads
    
    This will cause the DirectRunner to automatically shut down when the
    worker threads are shut down.

commit 20bfa9411b9b77c6678c17ed465249c3dc4b210d
Author: Thomas Groh <tgroh@google.com>
Date:   2017-05-08T21:36:35Z

    This closes #2961

commit 7dfc45563825b04f761af07d1cd5ae43ec38588b
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-08T22:35:15Z

    This closes #2952: Cherry pick #2927 (Apex ParDo chaining) to release-2.0.0
    
      [BEAM-831] Fix chaining, add test. closes #2216
      [BEAM-831] ParDo Fusion of Apex Runner

commit e8a364c68b16ca9f7b6a84f89d4da02cdade73e2
Author: Reuven Lax <relax@google.com>
Date:   2017-05-08T16:06:55Z

    TableDescription is allowed to be null.

----


> Writing to partitioned BigQuery tables from Dataflow is causing errors
> ----------------------------------------------------------------------
>
>                 Key: BEAM-2122
>                 URL: https://issues.apache.org/jira/browse/BEAM-2122
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>         Environment: Running with Beam 0.7.0-SNAPSHOT version 48 for beam-sdks-java-io-google-cloud-platform,
49 for beam-sdks-java-core and beam-runners-google-cloud-dataflow-java in Eclipse using Dataflow
service.
>            Reporter: Matthias Baetens
>            Assignee: Reuven Lax
>
> Using the latest Beam SNAPSHOT which has a new BigQuery connector and trying to write
to partitioned tables according to the docs (or this Stackoverflow question http://stackoverflow.com/questions/43505534/writing-different-values-to-different-bigquery-tables-in-apache-beam/43655461#43655461):
> 	static class PartitionedTableGeneration
> 			implements SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>
{
> 		@ProcessElement
> 		public TableDestination apply(ValueInSingleWindow<TableRow> value) {
> 			// String dayString =
> 			// DateTimeFormat.forPattern("yyyy_MM_dd").withZone(DateTimeZone.UTC)
> 			String dayString = DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC)
> 					.print(((IntervalWindow) value.getWindow()).start());
> 			TableDestination td = new TableDestination(
> 					"projecet:dataset.table + '$' dayString, "");
> 			return td;
> 		}
> 	}
> causes the following issues when running (depending on the specification of the dayString):
> 1. "Invalid table ID \"partitioned_sample$20150905\". Table IDs must be alphanumeric
(plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot
be used.",
>  2. java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
Failed to create load job with id prefix 
> ...
>     "errorResult" : {
>       "message" : "Invalid date partitioned table suffix: 2015_11_26",
>       "reason" : "invalid"
>     }
> Writing to sharded tables (without the '$'-sign) is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message