spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: `Project` not preserving child partitioning ?
Date Wed, 12 Oct 2016 18:26:53 GMT
It actually does -- but do it through a really weird way.

UnaryNodeExec actually defines:

trait UnaryExecNode extends SparkPlan {
  def child: SparkPlan

  override final def children: Seq[SparkPlan] = child :: Nil

  override def outputPartitioning: Partitioning = child.outputPartitioning
}


I think this is very risky because preserving output partitioning should
not be a property of UnaryNodeExec (e.g. exchange). It would be better
(safer) to move the output partitioning definition into each of the
operator and remove it from UnaryExecNode.

Would you be interested in submitting the patch?



On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <tejas.patil.cs@gmail.com>
wrote:

> See https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/
> basicPhysicalOperators.scala#L80
>
> Project operator preserves child's sort ordering but for output
> partitioning, it does not. I don't see any way projection would alter the
> partitioning of the child plan because rows are not passed across
> partitions when project happens (and if it does then it would also affect
> the sort ordering won't it ?). Am I missing something obvious here ?
>
> Thanks,
> Tejas
>

Mime
View raw message