spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tejas Patil <tejas.patil...@gmail.com>
Subject Re: `Project` not preserving child partitioning ?
Date Wed, 12 Oct 2016 18:33:44 GMT
Sure :)

Thanks,
Tejas

On Wed, Oct 12, 2016 at 11:26 AM, Reynold Xin <rxin@databricks.com> wrote:

> It actually does -- but do it through a really weird way.
>
> UnaryNodeExec actually defines:
>
> trait UnaryExecNode extends SparkPlan {
>   def child: SparkPlan
>
>   override final def children: Seq[SparkPlan] = child :: Nil
>
>   override def outputPartitioning: Partitioning = child.outputPartitioning
> }
>
>
> I think this is very risky because preserving output partitioning should
> not be a property of UnaryNodeExec (e.g. exchange). It would be better
> (safer) to move the output partitioning definition into each of the
> operator and remove it from UnaryExecNode.
>
> Would you be interested in submitting the patch?
>
>
>
> On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <tejas.patil.cs@gmail.com>
> wrote:
>
>> See https://github.com/apache/spark/blob/master/sql/core/src
>> /main/scala/org/apache/spark/sql/execution/basicPhysicalOpe
>> rators.scala#L80
>>
>> Project operator preserves child's sort ordering but for output
>> partitioning, it does not. I don't see any way projection would alter the
>> partitioning of the child plan because rows are not passed across
>> partitions when project happens (and if it does then it would also affect
>> the sort ordering won't it ?). Am I missing something obvious here ?
>>
>> Thanks,
>> Tejas
>>
>
>

Mime
View raw message