beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (BEAM-618) Python SDKs writes non RFC compliant JSON files for BQ Export
Date Mon, 10 Apr 2017 16:54:41 GMT

     [ https://issues.apache.org/jira/browse/BEAM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Halperin reassigned BEAM-618:
------------------------------------

    Assignee: Alex Amato  (was: Frances Perry)

> Python SDKs writes non RFC compliant JSON files for BQ Export
> -------------------------------------------------------------
>
>                 Key: BEAM-618
>                 URL: https://issues.apache.org/jira/browse/BEAM-618
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Alex Amato
>            Assignee: Alex Amato
>             Fix For: Not applicable
>
>
> Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ Exporter.
BigQuery can fail to parse these files when it tries to load these files into a BQ table because
json.dumps can export JSON which does not conform to the IEEE RFC.
> There are a few cases which are not RFC compilant listed in that module.
> https://docs.python.org/2/library/json.html#standard-compliance-and-interoperability
> The main issue we run into is the NAN, INF and -INF values.
> These fails with a confusing error (and we delete the GCS files making it hard to debug):
> JSON table encountered too many errors, giving up. Rows JSON parsing error in row starting
at position
> We can set the allow_nan argument to json.dumps to false to address these issues. So
that when a user tries to write a file with INF, -INF or NAN
> Setting this argument will produce this type of error when json.dumps is called with
NAN/INF values. We may want to catch this error to mention the fact that INF and NAN are not
allowed.
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
>     sort_keys=sort_keys, **kw).encode(obj)
>   File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
>     return _iterencode(o, 0)
> ValueError: Out of range float values are not JSON compliant



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message