spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Join implementation in SparkSQL
Date Thu, 15 Jan 2015 20:52:08 GMT
What Reynold is describing is a performance optimization in implementation,
but the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same results.

On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin <rxin@databricks.com> wrote:

> It's a bunch of strategies defined here:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
>
> In most common use cases (e.g. inner equi join), filters are pushed below
> the join or into the join. Doing a cartesian product followed by a filter
> is too expensive.
>
>
> On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta <alexbaretta@gmail.com
> >
> wrote:
>
> > Hello,
> >
> > Where can I find docs about how joins are implemented in SparkSQL? In
> > particular, I'd like to know whether they are implemented according to
> > their relational algebra definition as filters on top of a cartesian
> > product.
> >
> > Thanks,
> >
> > Alex
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message