From issues-return-284221-archive-asf-public=cust-asf.ponee.io@spark.apache.org Sat Apr 24 08:52:02 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 8807A18062C for ; Sat, 24 Apr 2021 10:52:02 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id AA7063EB7D for ; Sat, 24 Apr 2021 08:52:01 +0000 (UTC) Received: (qmail 46911 invoked by uid 500); 24 Apr 2021 08:52:01 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 46902 invoked by uid 99); 24 Apr 2021 08:52:01 -0000 Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Apr 2021 08:52:01 +0000 Received: from jira2-he-de.apache.org (jira2-he-de.apache.org [168.119.33.54]) by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPS id 0F49E3EAAC for ; Sat, 24 Apr 2021 08:52:01 +0000 (UTC) Received: from jira2-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira2-he-de.apache.org (ASF Mail Server at jira2-he-de.apache.org) with ESMTP id 48EB2C8006A for ; Sat, 24 Apr 2021 08:52:00 +0000 (UTC) Date: Sat, 24 Apr 2021 08:52:00 +0000 (UTC) From: "Darcy Shen (Jira)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-35211) Support UDT for Pandas with Arrow Disabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-35211?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D173= 31170#comment-17331170 ]=20 Darcy Shen commented on SPARK-35211: ------------------------------------ With schema provided, it works fine. {code} (spark) =E2=9E=9C spark git:(sadhen/SPARK-35211) =E2=9C=97 bin/pyspark Python 3.8.8 (default, Feb 24 2021, 13:46:16) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.proper= ties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLo= gLevel(newLevel). 21/04/24 16:49:38 WARN NativeCodeLoader: Unable to load native-hadoop libra= ry for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) Spark context Web UI available at http://172.30.0.12:4040 Spark context available as 'sc' (master =3D local[*], app id =3D local-1619= 254179325). SparkSession available as 'spark'. >>> # spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", True) >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", False) >>> from pyspark.testing.sqlutils import ExamplePoint, ExamplePointUDT >>> from pyspark.sql.types import StructType, StructField >>> import pandas as pd >>> schema =3D StructType([StructField('point', ExamplePointUDT(), False)]) >>> pdf =3D pd.DataFrame({'point': pd.Series([ExamplePoint(1.0, 1.0), Examp= lePoint(2.0, 2.0)])}) >>> df =3D spark.createDataFrame(pdf, schema) >>> >>> df.show() +----------+ | point| +----------+ |(1.0, 1.0)| |(2.0, 2.0)| +----------+ {code} > Support UDT for Pandas with Arrow Disabled > ------------------------------------------ > > Key: SPARK-35211 > URL: https://issues.apache.org/jira/browse/SPARK-35211 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 3.1.1 > Reporter: Darcy Shen > Priority: Major > > {code:java} > $ pip freeze > certifi=3D=3D2020.12.5 > coverage=3D=3D5.5 > flake8=3D=3D3.9.0 > mccabe=3D=3D0.6.1 > mypy=3D=3D0.812 > mypy-extensions=3D=3D0.4.3 > numpy=3D=3D1.20.1 > pandas=3D=3D1.2.3 > pyarrow=3D=3D2.0.0 > pycodestyle=3D=3D2.7.0 > pyflakes=3D=3D2.3.0 > python-dateutil=3D=3D2.8.1 > pytz=3D=3D2021.1 > scipy=3D=3D1.6.1 > six=3D=3D1.15.0 > typed-ast=3D=3D1.4.2 > typing-extensions=3D=3D3.7.4.3 > xmlrunner=3D=3D1.7.7 > {code} > {code} > (spark) =E2=9E=9C spark git:(master) bin/pyspark > Python 3.8.8 (default, Feb 24 2021, 13:46:16) > [Clang 10.0.0 ] :: Anaconda, Inc. on darwin > Type "help", "copyright", "credits" or "license" for more information. > Using Spark's default log4j profile: org/apache/spark/log4j-defaults.prop= erties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use set= LogLevel(newLevel). > 21/04/24 15:51:29 WARN NativeCodeLoader: Unable to load native-hadoop lib= rary for your platform... using builtin-java classes where applicable > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) > Spark context Web UI available at http://172.30.0.12:4040 > Spark context available as 'sc' (master =3D local[*], app id =3D local-16= 19250689842). > SparkSession available as 'spark'. > >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") > >>> from pyspark.testing.sqlutils import ExamplePoint > >>> > >>> import pandas as pd > >>> > >>> pdf =3D pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), Example= Point(2, 2)])}) > >>> > >>> df =3D spark.createDataFrame(pdf) > >>> > >>> df.show() > +----------+ > | point| > +----------+ > |(0.0, 0.0)| > |(0.0, 0.0)| > +----------+ > >>> df.toPandas() > point > 0 (0.0,0.0) > 1 (0.0,0.0) > >>> > >>> > {code} > The correct result should be: > {code} > point > 0 (1.0,1.0) > 1 (2.0,2.0) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org