beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-618) Python SDKs writes non RFC compliant JSON files for BQ Export
Date Fri, 09 Sep 2016 00:26:20 GMT


ASF GitHub Bot commented on BEAM-618:

GitHub user ajamato opened a pull request:

    [BEAM-618] Disallow NAN, INF and -INF invalid JSON values in bigquery exporter


You can merge this pull request into a Git repository by running:

    $ git pull python-sdk

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #935
commit 11d78a4c1791c1dfd88f0ac348c9c07cd48cafc8
Author: Ian Zhou <>
Date:   2016-06-09T21:17:14Z

    Modified range tracker to use first response seen as start key

commit ec6d88a787dfdab064bceb70d48b2ce1c5bfa9bb
Author: Thomas Groh <>
Date:   2016-06-14T01:34:49Z

    Reuse UnboundedReaders in the InProcessRunner
    Reuse up to a point, and then discard the reader to exercise resume from

commit d2ceaf5e5a778fad18472ab0d7c02a14259015d7
Author: Scott Wegner <>
Date:   2016-06-14T16:00:49Z

    Update DataflowPipelineRunner worker container version

commit 90bb20ee6738c57bc25f47e2d80690fb721b562e
Author: Thomas Groh <>
Date:   2016-06-14T22:49:34Z

    Explicitly set the Runner in TestFlinkPipelineRunner
    This ensures that the created PipelineOptions are valid if the
    DirectRunner is not on the classpath.

commit 45e57e0612ae692418e07d9c4483321f040cb4a7
Author: Thomas Groh <>
Date:   2016-06-15T00:51:48Z

    Remove DoFnRunner from GroupAlsoByWindowsProperties
    DoFnRunner is a runner implementation detail, and core SDK code should
    instead use DoFnTester.

commit 99654ca4bed6758d7128d0f0ad376e8b479d4eba
Author: Thomas Groh <>
Date:   2016-06-15T00:52:49Z

    Remove the DirectPipelineRunner from the Core SDK

commit d5e3dfaa864744ec9a011c51707d15f1ab68a734
Author: Scott Wegner <>
Date:   2016-06-15T16:51:59Z

    Fix NullPointerException in AfterWatermark display data
    Window transforms register display data for the associated trigger
    function by calling its .toString() method. The AfterWatermark
    trigger .toString() method was not properly handling cases where
    there is no late firings registered.

commit 340fe3ebcfef0b57b163483d7d7243ad5456ae72
Author: Scott Wegner <>
Date:   2016-06-15T17:17:01Z

    Package javadoc for org.apache.beam.sdk.transforms.display

commit 6ada1a635382fcddc42a7580e74e755839f7172e
Author: Thomas Groh <>
Date:   2016-06-15T19:01:56Z

    Run NeedsRunner tests in Runner Core on the DirectRunner
    This ensures that all runner tests in runners/core-java are executed in
    the standard maven build.

commit e90a1b9d74cbc06d7818bae8dfe2af81acd73222
Author: Kenneth Knowles <>
Date:   2016-06-08T22:07:52Z

    Roll-forwards: Base PAssert on GBK instead of side inputs
    Previously PAssert - hence all RunnableOnService/NeedsRunner
    tests - required side input support. This created a very steep
    on ramp for new runners.
    GroupByKey is a bit more fundamental and most backends will be
    able to group by key in the global window very quickly. So switching
    the primitive used to gather all the contents of a PCollection for
    assertions should make it a bit easier to get early feedback during
    runner development.

commit 0a7246d268969cb1b7f46149e38361802c95e70a
Author: Scott Wegner <>
Date:   2016-06-13T18:05:52Z

    Improve BigQueryIO validation for streaming WriteDisposition

commit 605833071a7034aa3b723776a0f9e24330f64c8b
Author: Pei He <>
Date:   2016-06-13T23:58:01Z

    Replace GcsPath by IOChannelFactory in WordCount.

commit 5bf732cd3e598321a5c51e1239eda0fe2877a65d
Author: Kenneth Knowles <>
Date:   2016-06-14T23:04:10Z

    Add test for ReduceFnRunner GC time overflow

commit cfa217a894575f392f1dfe1612e10e393df5c7ab
Author: Kenneth Knowles <>
Date:   2016-06-14T23:12:11Z

    Fix type error in Eclipse
    This type error occurs in my Eclipse installation. It apparently
    does not bother the various JDKs we test with. But this is an
    accurate typing, so it may help other Eclipse-using contributors,

commit 8278e5f78f36fb48fae994ee7abcc1485db84189
Author: Kenneth Knowles <>
Date:   2016-06-15T17:42:59Z

    [Spark] Elide assigning windows when WindowFn is null
    Previously, when translating a Window.Bound transform, the case
    where the WindowFn was null was missed, resulting in a

commit 9400fc9a699f218a7948c21639428f5f00134ec5
Author: Thomas Groh <>
Date:   2016-06-15T17:45:15Z

    Rename InProcessPipelineRunner to DirectRunner
    Completes BEAM-243

commit babddbbc8247bc7322c3fd519a5bf0fa23c57064
Author: Thomas Groh <>
Date:   2016-06-15T18:21:41Z

    Remove InProcess Prefixes
    These prefixes are out of date with the rename of the runner. Most of
    the prefixes are be droped in their entirety, as the classes are scoped
    to the direct runner module.

commit 6460df195240dac4d488fcf111642e8706008690
Author: Jesse Anderson <>
Date:   2016-05-09T17:05:15Z

    Added BigDecimal coder and tests.

commit 6491100a5d655cb9f6c702767d6354269208f650
Author: Kenneth Knowles <>
Date:   2016-06-09T20:24:28Z

    Touch up BigDecimalCoder and tests

commit 8268f1d7ffdd1205a1904037f7dd1e1887a52f8d
Author: Kenneth Knowles <>
Date:   2016-06-09T20:24:49Z

    Add BigIntegerCoder and tests

commit 4f7a2ab47c5fdd9b3de5f091a40128e68ddd11a3
Author: Kenneth Knowles <>
Date:   2016-06-14T23:10:09Z

    Fix overflow in ReduceFnRunner garbage collection times

commit 3d87f8b987e243c6b3d99ab67142301af7b65743
Author: manuzhang <>
Date:   2016-06-15T08:02:35Z

    [BEAM-342] Implement Filter#greaterThan,etc with Filter#byPredicate

commit 93f9ef92dcdcdec4f481e996b02f256cb18dc628
Author: Dan Halperin <>
Date:   2016-06-16T17:15:58Z

    CrashingRunner: cleanup some code
    make it final, fix an error message, remove unused code

commit e5812440ef985a44316e0dde7c5fa19d38f91aa0
Author: Pei He <>
Date:   2016-06-16T18:38:51Z

    Remove the beam.examples dependency from flink.

commit 6a41da853537e152613fb17bed782bc16d767c57
Author: Thomas Groh <>
Date:   2016-06-17T17:25:44Z

    Remove last vestige of the words DirectPipeline

commit 09bf9b3720f08acc9e94784461f2482ab371cd90
Author: Pei He <>
Date:   2016-06-17T20:02:58Z

    Remove references to javax.servlet.

commit 340d09845959340f73577512437ebe0939bdeff9
Author: Thomas Groh <>
Date:   2016-06-17T20:22:26Z

    Finish removing DirectPipelineRunner references

commit 30d226a3ae547c4a2d890d1d42487862323a4ae3
Author: Kenneth Knowles <>
Date:   2016-05-05T22:11:07Z

    Configure RunnableOnService tests for Spark runner, batch mode

commit 90d0bcfa74a0e99acb6721cc9c7623cf55e6626b
Author: Aljoscha Krettek <>
Date:   2016-06-01T09:56:18Z

    [BEAM-321] Fix Flink Comparators
    KvCoderComparator and CoderComparator were hashing the key directly
    while doing comparisons on the encoded form. This lead to
    inconsistencies in GroupByKey results with large numbers of elements per
    This changes the comparators to hash on the encoded form and also adds
    tests to verify the correct behavior.

commit d285e675920cd790c68053291c9bf843c21fc493
Author: Dan Halperin <>
Date:   2016-06-16T15:57:18Z

    DataflowPipelineJob: Retry messages, metrics, and status polls
    At some point in the past, we decided to use a rawDataflowClient that
    does not do retries when checking job status, because it was best-effort
    reporting to users. The purported goal was to not clutter the log with
    networking errors.
    However, since that time, we have:
    * Added the ability to suppress logs (emit only at DEBUG level or not at
      all) when retrying.
    * Increased reliability of the job checking status so that these errors
      are less frequent and more indicative of quota or other issues.
    * Started using the metrics in tests, where we do need to retry
      transient issues (BEAM-350).
    So let's drop the raw transport client and just use the one that


> Python SDKs writes non RFC compliant JSON files for BQ Export
> -------------------------------------------------------------
>                 Key: BEAM-618
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Alex Amato
>            Assignee: Frances Perry
> Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ Exporter.
BigQuery can fail to parse these files when it tries to load these files into a BQ table because
json.dumps can export JSON which does not conform to the IEEE RFC.
> There are a few cases which are not RFC compilant listed in that module.
> The main issue we run into is the NAN, INF and -INF values.
> These fails with a confusing error (and we delete the GCS files making it hard to debug):
> JSON table encountered too many errors, giving up. Rows JSON parsing error in row starting
at position
> We can set the allow_nan argument to json.dumps to false to address these issues. So
that when a user tries to write a file with INF, -INF or NAN
> Setting this argument will produce this type of error when json.dumps is called with
NAN/INF values. We may want to catch this error to mention the fact that INF and NAN are not
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.7/json/", line 250, in dumps
>     sort_keys=sort_keys, **kw).encode(obj)
>   File "/usr/lib/python2.7/json/", line 207, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/json/", line 270, in iterencode
>     return _iterencode(o, 0)
> ValueError: Out of range float values are not JSON compliant

This message was sent by Atlassian JIRA

View raw message