spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan
Date Thu, 23 Jun 2016 06:15:16 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng updated SPARK-16164:
----------------------------------
    Description: 
[~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters.
It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not
sure whether we should treat this as a bug.

{code}
val df1 = (0 until 3).map(_.toString).toDF
val indexer = new StringIndexer()
  .setInputCol("value")
  .setOutputCol("idx")
  .setHandleInvalid("skip")
  .fit(df1)
val df2 = (0 until 5).map(_.toString).toDF
val predictions = indexer.transform(df2)
predictions.where('idx > 2).show()
{code}

Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html
for error messages.

  was:
[~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters.
It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not
sure whether we should treat this as a bug.

{code}
val df1 = (0 until 3).map(_.toString).toDF
val indexer = new StringIndexer()
  .setInputCol("value")
  .setOutputCol("idx")
  .setHandleInvalid("skip")
  .fit(df1)
val df2 = (0 until 5).map(_.toString).toDF
val predictions = indexer.transform(df2)
predictions.where('idx > 2).show()
{code}


> Filter pushdown should keep the ordering in the logical plan
> ------------------------------------------------------------
>
>                 Key: SPARK-16164
>                 URL: https://issues.apache.org/jira/browse/SPARK-16164
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>
> [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional
filters. It seems that during filter pushdown, we changed the ordering in the logical plan.
I'm not sure whether we should treat this as a bug.
> {code}
> val df1 = (0 until 3).map(_.toString).toDF
> val indexer = new StringIndexer()
>   .setInputCol("value")
>   .setOutputCol("idx")
>   .setHandleInvalid("skip")
>   .fit(df1)
> val df2 = (0 until 5).map(_.toString).toDF
> val predictions = indexer.transform(df2)
> predictions.where('idx > 2).show()
> {code}
> Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html
for error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message