beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-2595) WriteToBigQuery does not work with nested json schema
Date Fri, 14 Jul 2017 22:33:00 GMT


ASF GitHub Bot commented on BEAM-2595:

GitHub user sb2nov opened a pull request:

    [BEAM-2595] Allow table schema objects in BQ DoFn

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`.
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](
    Cherry pick from master for BEAM-2535
    R: @aaltay 
    cc @jbonofre 

You can merge this pull request into a Git repository by running:

    $ git pull BEAM-2595-cp

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3563
commit ada4733b02bc38b1ef619fb991c068822a917595
Author: Sourabh Bajaj <>
Date:   2017-07-13T19:02:31Z

    [BEAM-2595] Allow table schema objects in BQ DoFn


> WriteToBigQuery does not work with nested json schema
> -----------------------------------------------------
>                 Key: BEAM-2595
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>    Affects Versions: 2.1.0
>         Environment: mac os local runner, Python
>            Reporter: Andrea Pierleoni
>            Assignee: Sourabh Bajaj
>            Priority: Minor
>              Labels: gcp
>             Fix For: 2.1.0
> I am trying to use the new `WriteToBigQuery` PTransform added to ``
in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `` are able to parse the json schema,
but they accept a schema as an instance of the class ``
> I am composing the `TableFieldSchema` as suggested here [],
and it looks fine when passed to the PTransform `WriteToBigQuery`. 
> The problem is that the base class `PTransformWithSideInputs` try to pickle and unpickle
the function []
 (that includes the TableFieldSchema instance) and for some reason when the class is unpickled
some `FieldList` instance are converted to simple lists, and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for bigquery?
> They are also relatively easy to parse into a TableFieldSchema.

This message was sent by Atlassian JIRA

View raw message