spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zuo Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13531) Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects
Date Thu, 03 Mar 2016 07:53:18 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177441#comment-15177441
] 

Zuo Wang commented on SPARK-13531:
----------------------------------

Caused by the commit in https://issues.apache.org/jira/browse/SPARK-13329

> Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation
available for objects
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13531
>                 URL: https://issues.apache.org/jira/browse/SPARK-13531
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: koert kuipers
>            Priority: Minor
>
> this is using spark 2.0.0-SNAPSHOT
> dataframe df1:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, obj#135: object, [if (input[0, object].isNullAt) null
else input[0, object].get AS x#128]
> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else x#9),
[input[0, object] AS obj#135]
>    +- WholeStageCodegen
>       :  +- Project [_1#8 AS x#9]
>       :     +- Scan ExistingRDD[_1#8]{noformat}
> show:
> {noformat}+---+
> |  x|
> +---+
> |  2|
> |  3|
> +---+{noformat}
> dataframe df2:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true), StructField(y,StringType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null else y#3.toString),
[if (input[0, object].isNullAt) null else input[0, object].get AS x#130,if (input[0, object].isNullAt)
null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString,
input[0, object].get, true) AS y#131]
> +- WholeStageCodegen
>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>    :     +- Scan ExistingRDD[_1#0,_2#1]{noformat}
> show:
> {noformat}+---+---+
> |  x|  y|
> +---+---+
> |  1|  1|
> |  2|  2|
> |  3|  3|
> +---+---+{noformat}
> i run:
> df1.join(df2, Seq("x")).show
> i get:
> {noformat}java.lang.UnsupportedOperationException: No size estimation available for objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat}
> now sure what changed, this ran about a week ago without issues (in our internal unit
tests). it is fully reproducible, however when i tried to minimize the issue i could not reproduce
it by just creating data frames in the repl with the same contents, so it probably has something
to do with way these are created (from Row objects and StructTypes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message