spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shea Parkes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
Date Mon, 28 Sep 2015 20:30:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933924#comment-14933924
] 

Shea Parkes commented on SPARK-10847:
-------------------------------------

This issue caused me to learn enough about Scala only to learn that the exception still wasn't
helpful once I even knew what a scala.Tuple2 was.

I'm not planning on doing any further work on this, so to the extent you were waiting to avoid
duplication of efforts with me, feel free to go ahead and knock it out.  I'm not entirely
familiar with the contribution guidelines, but I'm sure you can work them out.

In case it wasn't clear above, the line that triggers the error is:
{code:none}
metadata={'comment': None}
{code}

Thanks for the interest!

> Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-10847
>                 URL: https://issues.apache.org/jira/browse/SPARK-10847
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.5.0
>         Environment: Windows 7
> java version "1.8.0_60" (64bit)
> Python 3.4.x
> Standalone cluster mode (not local[n]; a full local cluster)
>            Reporter: Shea Parkes
>            Priority: Minor
>
> If the optional metadata passed to `pyspark.sql.types.StructField` includes a pythonic
`None`, the `pyspark.SparkContext.createDataFrame` will fail with a very cryptic/unhelpful
error.
> Here is a minimal reproducible example:
> {code:none}
> # Assumes sc exists
> import pyspark.sql.types as types
> sqlContext = SQLContext(sc)
> literal_metadata = types.StructType([
>     types.StructField(
>         'name',
>         types.StringType(),
>         nullable=True,
>         metadata={'comment': 'From accounting system.'}
>         ),
>     types.StructField(
>         'age',
>         types.IntegerType(),
>         nullable=True,
>         metadata={'comment': None}
>         ),
>     ])
> literal_rdd = sc.parallelize([
>     ['Bob', 34],
>     ['Dan', 42],
>     ])
> print(literal_rdd.take(2))
> failed_dataframe = sqlContext.createDataFrame(
>     literal_rdd,
>     literal_metadata,
>     )
> {code}
> This produces the following ~stacktrace:
> {noformat}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "<string>", line 28, in <module>
>   File "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\context.py",
line 408, in createDataFrame
>     jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
>   File "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
line 538, in __call__
>   File "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\pyspark\sql\utils.py",
line 36, in deco
>     return f(*a, **kw)
>   File "S:\ZQL\Software\Hotware\spark-1.5.0-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
> 	at org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:160)
> 	at org.apache.spark.sql.types.Metadata$$anonfun$fromJObject$1.apply(Metadata.scala:127)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at org.apache.spark.sql.types.Metadata$.fromJObject(Metadata.scala:127)
> 	at org.apache.spark.sql.types.DataType$.org$apache$spark$sql$types$DataType$$parseStructField(DataType.scala:173)
> 	at org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
> 	at org.apache.spark.sql.types.DataType$$anonfun$parseDataType$1.apply(DataType.scala:148)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> 	at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> 	at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:148)
> 	at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:96)
> 	at org.apache.spark.sql.SQLContext.parseDataType(SQLContext.scala:961)
> 	at org.apache.spark.sql.SQLContext.applySchemaToPythonRDD(SQLContext.scala:970)
> 	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.lang.reflect.Method.invoke(Unknown Source)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> 	at py4j.Gateway.invoke(Gateway.java:259)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
> 	at java.lang.Thread.run(Unknown Source)
> {noformat}
> I believe the most important line of the traceback is this one:
> {noformat}
> py4j.protocol.Py4JJavaError: An error occurred while calling o757.applySchemaToPythonRDD.
> : java.lang.RuntimeException: Do not support type class scala.Tuple2.
> {noformat}
> But it wasn't enough for me to figure out the problem; I had to steadily simplify my
program until I could identify what caused the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message