Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8D8FD1807C for ; Thu, 3 Mar 2016 07:53:18 +0000 (UTC) Received: (qmail 45276 invoked by uid 500); 3 Mar 2016 07:53:18 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 45240 invoked by uid 500); 3 Mar 2016 07:53:18 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 45231 invoked by uid 99); 3 Mar 2016 07:53:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Mar 2016 07:53:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1918B2C1F55 for ; Thu, 3 Mar 2016 07:53:18 +0000 (UTC) Date: Thu, 3 Mar 2016 07:53:18 +0000 (UTC) From: "Zuo Wang (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-13531) Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177441#comment-15177441 ] Zuo Wang commented on SPARK-13531: ---------------------------------- Caused by the commit in https://issues.apache.org/jira/browse/SPARK-13329 > Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-13531 > URL: https://issues.apache.org/jira/browse/SPARK-13531 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: koert kuipers > Priority: Minor > > this is using spark 2.0.0-SNAPSHOT > dataframe df1: > schema: > {noformat}StructType(StructField(x,IntegerType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions , obj#135: object, [if (input[0, object].isNullAt) null else input[0, object].get AS x#128] > +- MapPartitions , createexternalrow(if (isnull(x#9)) null else x#9), [input[0, object] AS obj#135] > +- WholeStageCodegen > : +- Project [_1#8 AS x#9] > : +- Scan ExistingRDD[_1#8]{noformat} > show: > {noformat}+---+ > | x| > +---+ > | 2| > | 3| > +---+{noformat} > dataframe df2: > schema: > {noformat}StructType(StructField(x,IntegerType,true), StructField(y,StringType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions , createexternalrow(x#2, if (isnull(y#3)) null else y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, object].get, true) AS y#131] > +- WholeStageCodegen > : +- Project [_1#0 AS x#2,_2#1 AS y#3] > : +- Scan ExistingRDD[_1#0,_2#1]{noformat} > show: > {noformat}+---+---+ > | x| y| > +---+---+ > | 1| 1| > | 2| 2| > | 3| 3| > +---+---+{noformat} > i run: > df1.join(df2, Seq("x")).show > i get: > {noformat}java.lang.UnsupportedOperationException: No size estimation available for objects. > at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) > at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) > at org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat} > now sure what changed, this ran about a week ago without issues (in our internal unit tests). it is fully reproducible, however when i tried to minimize the issue i could not reproduce it by just creating data frames in the repl with the same contents, so it probably has something to do with way these are created (from Row objects and StructTypes). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org