spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ashashwat <...@git.apache.org>
Subject [GitHub] spark pull request #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviou...
Date Sun, 04 Feb 2018 11:33:43 GMT
GitHub user ashashwat opened a pull request:

    https://github.com/apache/spark/pull/20503

    [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows.

    ## What changes were proposed in this pull request?
    
    Fix \_\_repr\_\_ behaviour for Rows.
    
    Rows \_\_repr\_\_ assumes data is a string when column name is missing.
    Examples,
    ```
    >>> from pyspark.sql.types import Row
    >>> Row ("Alice", "11")
    <Row(Alice, 11)>
    
    >>> Row (name="Alice", age=11)
    Row(age=11, name='Alice')
    
    >>> Row ("Alice", 11)
    <snip stack trace>
    TypeError: sequence item 1: expected string, int found
    ```
    
    This is because Row () when called without column names assumes
    everything is a string.
    
    ## How was this patch tested?
    
    Manually tested and unittest was added in `python/pyspark/sql/tests.py`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ashashwat/spark SPARK-23299

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20503
    
----
commit 6604e9fdaa710cd894b4799390144e404667402e
Author: Shashwat Anand <me@...>
Date:   2018-02-04T10:27:31Z

    Fix __repr__ behaviour for Rows.
    
    Rows __repr__ assumes data is strings when column name is missing.
    
    Examples,
    >>> Row ("Alice", "11")
    <Row(Alice, 11)>
    
    >>> Row (name="Alice", age=11)
    Row(age=11, name='Alice')
    
    >>> Row ("Alice", 11)
    <snip stack trace>
    TypeError: sequence item 1: expected string, int found
    
    This is because Row () when called without column names assumes
    everything is string.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message