spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam De Backer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23959) UnresolvedException with DataSet created from Seq.empty since Spark 2.3.0
Date Wed, 11 Apr 2018 14:52:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sam De Backer updated SPARK-23959:
----------------------------------
    Description: 
The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception
in Spark 2.3.0:
{code:java}
import sparkSession.implicits._
import org.apache.spark.sql.functions._

case class X(xid: Long, yid: Int)
case class Y(yid: Int, zid: Long)
case class Z(zid: Long, b: Boolean)

val xs = Seq(X(1L, 10)).toDS()
val ys = Seq(Y(10, 100L)).toDS()
val zs = Seq.empty[Z].toDS()

val j = xs
  .join(ys, "yid")
  .join(zs, Seq("zid"), "left")
  .withColumn("BAM", when('b, "B").otherwise("NB"))

j.show(){code}
In Spark 2.2.1 it prints to the console
{noformat}
+---+---+---+----+---+
|zid|yid|xid|   b|BAM|
+---+---+---+----+---+
|100| 10|  1|null| NB|
+---+---+---+----+---+{noformat}
In Spark 2.3.0 it results in:
{noformat}
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved
object, tree: 'BAM
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
...{noformat}
The culprit really seems to be DataSet being created from an empty Seq[Z]. When you change
that to something that will also result in an empty DataSet[Z] it works as in Spark 2.2.1,
e.g.
{code:java}
val zs = Seq(Z(10L, true)).toDS().filter('zid < Long.MinValue){code}

  was:
The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception
in Spark 2.3.0:
{code:java}
import sparkSession.implicits._
import org.apache.spark.sql.functions._

case class X(xid: Long, yid: Int)
case class Y(yid: Int, zid: Long)
case class Z(zid: Long, b: Boolean)

val xs = Seq(X(1L, 10)).toDS()
val ys = Seq(Y(10, 100L)).toDS()
val zs = Seq.empty[Z].toDS()

val j = xs
  .join(ys, "yid")
  .join(zs, Seq("zid"), "left")
  .withColumn("BAM", when('b, "B").otherwise("NB"))

j.show(){code}
In Spark 2.2.1 it prints to the console
{noformat}
+---+---+---+----+---+
|zid|yid|xid|   b|BAM|
+---+---+---+----+---+
|100| 10|  1|null| NB|
+---+---+---+----+---+{noformat}
In Spark 2.3.0 it results in:
{noformat}
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved
object, tree: 'BAM
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
...{noformat}
The culprit really seems to be DataSet being created from an empty Seq[Z]. When you change
that to something that will also result in an empty DataSet[Z] it works as in Spark 2.2.1,
e.g.
{code:java}
val zs = Seq(Z(10L, true)).toDS().filter('zid === Long.MinValue){code}


> UnresolvedException with DataSet created from Seq.empty since Spark 2.3.0
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23959
>                 URL: https://issues.apache.org/jira/browse/SPARK-23959
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Sam De Backer
>            Priority: Major
>
> The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception
in Spark 2.3.0:
> {code:java}
> import sparkSession.implicits._
> import org.apache.spark.sql.functions._
> case class X(xid: Long, yid: Int)
> case class Y(yid: Int, zid: Long)
> case class Z(zid: Long, b: Boolean)
> val xs = Seq(X(1L, 10)).toDS()
> val ys = Seq(Y(10, 100L)).toDS()
> val zs = Seq.empty[Z].toDS()
> val j = xs
>   .join(ys, "yid")
>   .join(zs, Seq("zid"), "left")
>   .withColumn("BAM", when('b, "B").otherwise("NB"))
> j.show(){code}
> In Spark 2.2.1 it prints to the console
> {noformat}
> +---+---+---+----+---+
> |zid|yid|xid|   b|BAM|
> +---+---+---+----+---+
> |100| 10|  1|null| NB|
> +---+---+---+----+---+{noformat}
> In Spark 2.3.0 it results in:
> {noformat}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType
on unresolved object, tree: 'BAM
> at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
> at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
> at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
> ...{noformat}
> The culprit really seems to be DataSet being created from an empty Seq[Z]. When you change
that to something that will also result in an empty DataSet[Z] it works as in Spark 2.2.1,
e.g.
> {code:java}
> val zs = Seq(Z(10L, true)).toDS().filter('zid < Long.MinValue){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message