spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SimonBin <>
Subject [GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Date Mon, 20 Nov 2017 13:03:08 GMT
Github user SimonBin commented on the issue:
    Hi, we are very interested in this patch. I wonder if it could detect this code automatically,
without needing to write the explicit join:
    package net.sansa_stack.spark.playground
    import org.apache.spark.sql.{Row, SparkSession}
    import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
    import org.scalatest._
    class TestSparkSqlJoin extends FlatSpec {
      "SPARK SQL processor" should "be capable of handling transitive join conditions" in
        val spark = SparkSession
        val schema = new StructType()
          .add("s", IntegerType, nullable = true)
          .add("p", IntegerType, nullable = true)
          .add("o", IntegerType, nullable = true)
        val data = List((1, 2, 3))
        val dataRDD = spark.sparkContext.parallelize(data).map(attributes => Row(attributes._1,
attributes._2, attributes._3))
        spark.createDataFrame(dataRDD, schema).createOrReplaceTempView("T")
        spark.sql("SELECT A.s FROM T A, T B WHERE A.s = 1 AND B.s = 1").explain(true)
    I built this Pull request locally but it still gives me the same issue -->
    == Physical Plan ==
    org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between
logical plans
    Project [s#3]
    +- Filter (isnotnull(s#3) && (s#3 = 1))
       +- LogicalRDD [s#3, p#4, o#5], false
    +- Filter (isnotnull(s#25) && (s#25 = 1))
       +- LogicalRDD [s#25, p#26, o#27], false
    Join condition is missing or trivial.
    Use the CROSS JOIN syntax to allow cartesian products between these relations.;


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message