Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 22 Apr 2016 16:21:12 +0000 (UTC)
From: "Davies Liu (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12938193.1455115096000.9712.1461342072883@Atlassian.JIRA>
In-Reply-To: <JIRA.12938193.1455115096000@Atlassian.JIRA>
References: <JIRA.12938193.1455115096000@Atlassian.JIRA>
 <JIRA.12938193.1455115096322@arcas>
Subject: [jira] [Updated] (SPARK-13266) Python DataFrameReader converts None
 to "None" instead of null
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/SPARK-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Davies Liu updated SPARK-13266:
-------------------------------
    Assignee: Liang-Chi Hsieh

> Python DataFrameReader converts None to "None" instead of null
> --------------------------------------------------------------
>
>                 Key: SPARK-13266
>                 URL: https://issues.apache.org/jira/browse/SPARK-13266
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.6.0
>         Environment: Linux standalone but probably applies to all
>            Reporter: mathieu longtin
>            Assignee: Liang-Chi Hsieh
>              Labels: easyfix, patch
>             Fix For: 2.0.0
>
>
> If you do something like this:
> {code:none}
> tsv_loader = sqlContext.read.format('com.databricks.spark.csv')
> tsv_loader.options(quote=None, escape=None)
> {code}
> The loader sees the string "None" as the _quote_ and _escape_ options. The loader should get a _null_.
> An easy fix is to modify *python/pyspark/sql/readwriter.py* near the top, correct the _to_str_ function. Here's the patch:
> {code:none}
> diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
> index a3d7eca..ba18d13 100644
> --- a/python/pyspark/sql/readwriter.py
> +++ b/python/pyspark/sql/readwriter.py
> @@ -33,10 +33,12 @@ __all__ = ["DataFrameReader", "DataFrameWriter"]
>  def to_str(value):
>      """
> -    A wrapper over str(), but convert bool values to lower case string
> +    A wrapper over str(), but convert bool values to lower case string, and keep None
>      """
>      if isinstance(value, bool):
>          return str(value).lower()
> +    elif value is None:
> +        return value
>      else:
>          return str(value)
> {code}
> This has been tested and works great.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org