spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: [Catalyst] RFC: Using PartialFunction literals instead of objects
Date Tue, 19 May 2015 18:35:57 GMT
Overall this seems like a reasonable proposal to me.  Here are a few
thoughts:

 - There is some debugging utility to the ruleName, so we would probably
want to at least make that an argument to the rule function.
 - We also have had rules that operate on SparkPlan, though since there is
only one ATM maybe we don't need sugar there.
 - I would not call the sugar for creating Strategies rule/seqrule, as I
think the one-to-one vs one-to-many distinction is useful.
 - I'm generally pro-refactoring to make the code nicer, especially when
its not official public API, but I do think its important to maintain
source compatibility (which I think you are) when possible as there are
other projects using catalyst.
 - Finally, we'll have to balance this with other code changes / conflicts.

You should probably open a JIRA and we can continue the discussion there.

On Tue, May 19, 2015 at 4:16 AM, Edoardo Vacchi <uncommonnonsense@gmail.com>
wrote:

> Hi everybody,
>
> At the moment, Catalyst rules are defined using two different types of
> rules:
> `Rule[LogicalPlan]` and `Strategy` (which in turn maps to
> `GenericStrategy[SparkPlan]`).
>
> I propose to introduce utility methods to
>
>   a) reduce the boilerplate to define rewrite rules
>   b) turning them back into what they essentially represent: function
> types.
>
> These changes would be backwards compatible, and would greatly help in
> understanding what the code does. Personally, I feel like the current
> use of objects is redundant and possibly confusing.
>
> ## `Rule[LogicalPlan]`
>
> The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides
> defining a default `val ruleName`
> only defines the method `apply(plan: TreeType): TreeType`.
> Because the body of such method is always supposed to read `plan match
> pf`, with `pf`
> being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can
> conclude that `Rule[LogicalPlan]`
> might be substituted by a PartialFunction.
>
> I propose the following:
>
> a) Introduce the utility method
>
>     def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]):
> Rule[LogicalPlan] =
>       new Rule[LogicalPlan] {
>         def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>       }
>
> b) progressively replace the boilerplate-y object definitions; e.g.
>
>     object MyRewriteRule extends Rule[LogicalPlan] {
>       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
>         case ... => ...
>     }
>
> with
>
>     // define a Rule[LogicalPlan]
>     val MyRewriteRule = rule {
>       case ... => ...
>     }
>
> it might also be possible to make rule method `implicit`, thereby
> further reducing MyRewriteRule to:
>
>     // define a PartialFunction[LogicalPlan, LogicalPlan]
>     // the implicit would convert it into a Rule[LogicalPlan] at the use
> sites
>     val MyRewriteRule = {
>       case ... => ...
>     }
>
>
> ## Strategies
>
> A similar solution could be applied to shorten the code for
> Strategies, which are total functions
> only because they are all supposed to manage the default case,
> possibly returning `Nil`. In this case
> we might introduce the following utility methods:
>
> /**
>  * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
>  * The partial function must therefore return *one single* SparkPlan
> for each case.
>  * The method will automatically wrap them in a [[Seq]].
>  * Unhandled cases will automatically return Seq.empty
>  */
> protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
>   new Strategy {
>     def apply(plan: LogicalPlan): Seq[SparkPlan] =
>       if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
>   }
>
> /**
>  * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan]
> ].
>  * The partial function must therefore return a Seq[SparkPlan] for each
> case.
>  * Unhandled cases will automatically return Seq.empty
>  */
> protected def seqrule(pf: PartialFunction[LogicalPlan,
> Seq[SparkPlan]]): Strategy =
>   new Strategy {
>     def apply(plan: LogicalPlan): Seq[SparkPlan] =
>       if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
>   }
>
> Thanks in advance
> e.v.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message