spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gheorghe Gheorghe (JIRA)" <>
Subject [jira] [Created] (SPARK-21390) Dataset filter api inconsistency
Date Wed, 12 Jul 2017 15:55:00 GMT
Gheorghe Gheorghe created SPARK-21390:

             Summary: Dataset filter api inconsistency
                 Key: SPARK-21390
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.1
            Reporter: Gheorghe Gheorghe
            Priority: Minor

Hello everybody, 

I've encountered a strange situation with spark 2.0.1 in spark-shell. 
When I run the code below in my IDE I get the in the second test case as expected 1. However,
when I run the spark shell with the same code the second test case is returning 0. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell.

  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  // Test 2
  case class OtherClass(field1:String, field2:String)
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1,

Note if I do this it is printing 1 as expected.
 println(> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count)

Is this a bug? I can see that this filter function has been marked as experimental

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message