Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 47094200B8C for ; Mon, 12 Sep 2016 20:56:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 45E7F160AB2; Mon, 12 Sep 2016 18:56:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8F382160AB8 for ; Mon, 12 Sep 2016 20:56:25 +0200 (CEST) Received: (qmail 147 invoked by uid 500); 12 Sep 2016 18:56:24 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 138 invoked by uid 99); 12 Sep 2016 18:56:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2016 18:56:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 50B441A729C for ; Mon, 12 Sep 2016 18:56:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id mjPxr9zUJUj6 for ; Mon, 12 Sep 2016 18:56:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id BBD8360D1F for ; Mon, 12 Sep 2016 18:56:21 +0000 (UTC) Received: (qmail 99283 invoked by uid 99); 12 Sep 2016 18:56:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2016 18:56:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9A6642C1B81 for ; Mon, 12 Sep 2016 18:56:20 +0000 (UTC) Date: Mon, 12 Sep 2016 18:56:20 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-618) Python SDKs writes non RFC compliant JSON files for BQ Export MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 12 Sep 2016 18:56:26 -0000 [ https://issues.apache.org/jira/browse/BEAM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484968#comment-15484968 ] ASF GitHub Bot commented on BEAM-618: ------------------------------------- GitHub user ajamato opened a pull request: https://github.com/apache/incubator-beam/pull/947 [BEAM-618] Disallow NAN, INF and -INF invalid JSON values in bigquery exporter Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Now exporting JSON will fail with invalid NAN, INF or -INF values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajamato/incubator-beam py_json Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/947.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #947 ---- commit 442bed71e68524368408573ce0bcb22901d7f861 Author: Alex Amato Date: 2016-09-09T00:57:28Z Set allow_nan=False on bigquery JSON encoding ---- > Python SDKs writes non RFC compliant JSON files for BQ Export > ------------------------------------------------------------- > > Key: BEAM-618 > URL: https://issues.apache.org/jira/browse/BEAM-618 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Alex Amato > Assignee: Frances Perry > > Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ Exporter. BigQuery can fail to parse these files when it tries to load these files into a BQ table because json.dumps can export JSON which does not conform to the IEEE RFC. > There are a few cases which are not RFC compilant listed in that module. > https://docs.python.org/2/library/json.html#standard-compliance-and-interoperability > The main issue we run into is the NAN, INF and -INF values. > These fails with a confusing error (and we delete the GCS files making it hard to debug): > JSON table encountered too many errors, giving up. Rows JSON parsing error in row starting at position > We can set the allow_nan argument to json.dumps to false to address these issues. So that when a user tries to write a file with INF, -INF or NAN > Setting this argument will produce this type of error when json.dumps is called with NAN/INF values. We may want to catch this error to mention the fact that INF and NAN are not allowed. > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps > sort_keys=sort_keys, **kw).encode(obj) > File "/usr/lib/python2.7/json/encoder.py", line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode > return _iterencode(o, 0) > ValueError: Out of range float values are not JSON compliant -- This message was sent by Atlassian JIRA (v6.3.4#6332)