spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-21837) UserDefinedTypeSuite local UDFs not actually testing what it intends
Date Fri, 25 Aug 2017 10:24:00 GMT
Sean Owen created SPARK-21837:
---------------------------------

             Summary: UserDefinedTypeSuite local UDFs not actually testing what it intends
                 Key: SPARK-21837
                 URL: https://issues.apache.org/jira/browse/SPARK-21837
             Project: Spark
          Issue Type: Bug
          Components: SQL, Tests
    Affects Versions: 2.2.0
            Reporter: Sean Owen
            Assignee: Sean Owen
            Priority: Minor


Consider this test in {{UserDefinedTypeSuite}}:

{code}
  test("Local UDTs") {
    val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0)))).toDF("int", "vec")
    df.collect()(0).getAs[UDT.MyDenseVector](1)
    df.take(1)(0).getAs[UDT.MyDenseVector](1)
    df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0)
    df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0)
      .getAs[UDT.MyDenseVector](0)
  }
{code}

I claim the last two lines can't be right, because they say that the first column in the aggregation
is the vector, when it is the grouping key (int). But it passes! 

But it started failing when I made seemingly unrelated changes in https://github.com/apache/spark/pull/18645
like:

{code}
[info] - Local UDTs *** FAILED *** (144 milliseconds)
[info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector
[info]   at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
[info]   at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
{code}

I modified the test to actually assert that the vector that results in each case is the expected
one, and it began failing with the same error, in master. Therefore I am pretty sure the test
is not quite doing what it seems to want to, and the result of these expressions just happened
to not be fully evaluated or checked.

CC [~marmbrus] for the discussion at https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234
and apologies if I'm still really missing something here. I'll open a PR to show you what
I mean.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message