spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Union in Spark context
Date Mon, 05 Feb 2018 16:47:57 GMT
First, the public API cannot be changed except when there is a major
version change, and there is no way that we are going to do Spark 3.0.0
just for this change.

Second, the change would be a mistake since the two different union methods
are quite different. The method in RDD only ever works on two RDDs at a
time, whereas the method in SparkContext can work on many RDDs in a single
call. That means that the method in SparkContext is much preferred when
unioning many RDDs to prevent a lengthy lineage chain.

On Mon, Feb 5, 2018 at 8:04 AM, Suchith J N <suchithjn22@gmail.com> wrote:

> Hi,
>
> Seems like simple clean up - Why do we have union() on RDDs in
> SparkContext? Shouldn't it reside in RDD? There is one in RDD, but it seems
> like a wrapper around this.
>
> Regards,
> Suchith
>

Mime
View raw message