Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8DF29200C48 for ; Thu, 6 Apr 2017 18:54:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8C6F6160BA4; Thu, 6 Apr 2017 16:54:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D4D1A160B81 for ; Thu, 6 Apr 2017 18:54:44 +0200 (CEST) Received: (qmail 44333 invoked by uid 500); 6 Apr 2017 16:54:44 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 44323 invoked by uid 99); 6 Apr 2017 16:54:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Apr 2017 16:54:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 84919188A05 for ; Thu, 6 Apr 2017 16:54:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id kEDG_NlLdBrT for ; Thu, 6 Apr 2017 16:54:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5FD165F3BF for ; Thu, 6 Apr 2017 16:54:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0175BE0A6C for ; Thu, 6 Apr 2017 16:54:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A20CB24016 for ; Thu, 6 Apr 2017 16:54:41 +0000 (UTC) Date: Thu, 6 Apr 2017 16:54:41 +0000 (UTC) From: "Ahmet Altay (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-618) Python SDKs writes non RFC compliant JSON files for BQ Export MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 06 Apr 2017 16:54:45 -0000 [ https://issues.apache.org/jira/browse/BEAM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959280#comment-15959280 ] Ahmet Altay commented on BEAM-618: ---------------------------------- [~ajamato@google.com] is this fixed? > Python SDKs writes non RFC compliant JSON files for BQ Export > ------------------------------------------------------------- > > Key: BEAM-618 > URL: https://issues.apache.org/jira/browse/BEAM-618 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Alex Amato > Assignee: Frances Perry > > Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ Exporter. BigQuery can fail to parse these files when it tries to load these files into a BQ table because json.dumps can export JSON which does not conform to the IEEE RFC. > There are a few cases which are not RFC compilant listed in that module. > https://docs.python.org/2/library/json.html#standard-compliance-and-interoperability > The main issue we run into is the NAN, INF and -INF values. > These fails with a confusing error (and we delete the GCS files making it hard to debug): > JSON table encountered too many errors, giving up. Rows JSON parsing error in row starting at position > We can set the allow_nan argument to json.dumps to false to address these issues. So that when a user tries to write a file with INF, -INF or NAN > Setting this argument will produce this type of error when json.dumps is called with NAN/INF values. We may want to catch this error to mention the fact that INF and NAN are not allowed. > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps > sort_keys=sort_keys, **kw).encode(obj) > File "/usr/lib/python2.7/json/encoder.py", line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode > return _iterencode(o, 0) > ValueError: Out of range float values are not JSON compliant -- This message was sent by Atlassian JIRA (v6.3.15#6346)