spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SimonBin <...@git.apache.org>
Subject [GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Date Mon, 20 Nov 2017 13:03:08 GMT
Github user SimonBin commented on the issue:

    https://github.com/apache/spark/pull/18692
  
    Hi, we are very interested in this patch. I wonder if it could detect this code automatically,
without needing to write the explicit join:
    
    ```scala
    package net.sansa_stack.spark.playground
    
    import org.apache.spark.sql.{Row, SparkSession}
    import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
    import org.scalatest._
    
    class TestSparkSqlJoin extends FlatSpec {
    
      "SPARK SQL processor" should "be capable of handling transitive join conditions" in
{
    
        val spark = SparkSession
          .builder()
          .master("local[1]")
          .getOrCreate()
    
        val schema = new StructType()
          .add("s", IntegerType, nullable = true)
          .add("p", IntegerType, nullable = true)
          .add("o", IntegerType, nullable = true)
    
        val data = List((1, 2, 3))
        val dataRDD = spark.sparkContext.parallelize(data).map(attributes => Row(attributes._1,
attributes._2, attributes._3))
        spark.createDataFrame(dataRDD, schema).createOrReplaceTempView("T")
    
        spark.sql("SELECT A.s FROM T A, T B WHERE A.s = 1 AND B.s = 1").explain(true)
      }
    
    }
    ```
    
    
    I built this Pull request locally but it still gives me the same issue -->
    
    ```
    == Physical Plan ==
    org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between
logical plans
    Project [s#3]
    +- Filter (isnotnull(s#3) && (s#3 = 1))
       +- LogicalRDD [s#3, p#4, o#5], false
    and
    Project
    +- Filter (isnotnull(s#25) && (s#25 = 1))
       +- LogicalRDD [s#25, p#26, o#27], false
    Join condition is missing or trivial.
    Use the CROSS JOIN syntax to allow cartesian products between these relations.;
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message