spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-25072) PySpark custom Row class can be given extra parameters
Date Thu, 06 Sep 2018 17:19:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bryan Cutler resolved SPARK-25072.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.2
                   2.4.0
                   3.0.0

Issue resolved by pull request 22140
[https://github.com/apache/spark/pull/22140]

> PySpark custom Row class can be given extra parameters
> ------------------------------------------------------
>
>                 Key: SPARK-25072
>                 URL: https://issues.apache.org/jira/browse/SPARK-25072
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>         Environment: {noformat}
> SPARK_MAJOR_VERSION is set to 2, using Spark2
> Python 3.4.5 (default, Dec 11 2017, 16:57:19)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
> 18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
> 18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting
port 4041.
> 18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
>       /_/
> Using Python version 3.4.5 (default, Dec 11 2017 16:57:19)
> SparkSession available as 'spark'.
> {noformat}
> {{CentOS release 6.9 (Final)}}
> {{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 12 20:21:04
EST 2017 x86_64 x86_64 x86_64 GNU/Linux}}
> {noformat}openjdk version "1.8.0_161"
> OpenJDK Runtime Environment (build 1.8.0_161-b14)
> OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat}
>            Reporter: Jan-Willem van der Sijp
>            Assignee: Li Yuanjian
>            Priority: Minor
>             Fix For: 3.0.0, 2.4.0, 2.3.2
>
>
> When a custom Row class is made in PySpark, it is possible to provide the constructor
of this class with more parameters than there are columns. These extra parameters affect the
value of the Row, but are not part of the {{repr}} or {{str}} output, making it hard to debug
errors due to these "invisible" values. The hidden values can be accessed through integer-based
indexing though.
> Some examples:
> {code:python}
> In [69]: RowClass = Row("column1", "column2")
> In [70]: RowClass(1, 2) == RowClass(1, 2)
> Out[70]: True
> In [71]: RowClass(1, 2) == RowClass(1, 2, 3)
> Out[71]: False
> In [75]: RowClass(1, 2, 3)
> Out[75]: Row(column1=1, column2=2)
> In [76]: RowClass(1, 2)
> Out[76]: Row(column1=1, column2=2)
> In [77]: RowClass(1, 2, 3).asDict()
> Out[77]: {'column1': 1, 'column2': 2}
> In [78]: RowClass(1, 2, 3)[2]
> Out[78]: 3
> In [79]: repr(RowClass(1, 2, 3))
> Out[79]: 'Row(column1=1, column2=2)'
> In [80]: str(RowClass(1, 2, 3))
> Out[80]: 'Row(column1=1, column2=2)'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message