spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: UNION two RDDs
Date Mon, 22 Dec 2014 21:46:12 GMT
Hi Sean and Madhu,

Thank you for the explanation. I really appreciate it.

Best Regards,

Jerry


On Fri, Dec 19, 2014 at 4:50 AM, Sean Owen <sowen@cloudera.com> wrote:

> coalesce actually changes the number of partitions. Unless the
> original RDD had just 1 partition, coalesce(1) will make an RDD with 1
> partition that is larger than the original partitions, of course.
>
> I don't think the question is about ordering of things within an
> element of the RDD?
>
> If the original RDD was sorted, and so has a defined ordering, then it
> will be preserved. Otherwise I believe you do not have any guarantees
> about ordering. In practice, you may find that you still encounter the
> elements in the same order after coalesce(1), although I am not sure
> that is even true.
>
> union() is the same story; unless the RDDs are sorted I don't think
> there are guarantees. However I'm almost certain that in practice, as
> it happens now, A's elements would come before B's after a union, if
> you did traverse them.
>
> On Fri, Dec 19, 2014 at 5:41 AM, madhu phatak <phatak.dev@gmail.com>
> wrote:
> > Hi,
> > coalesce is an operation which changes no of records in a partition. It
> will
> > not touch ordering with in a row AFAIK.
> >
> > On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam <chilinglam@gmail.com> wrote:
> >>
> >> Hi Spark users,
> >>
> >> I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
> >> RDDA before records in RDDB.
> >>
> >> Also, will resultRDD.coalesce(1) change this ordering?
> >>
> >> Best Regards,
> >>
> >> Jerry
> >
> >
> >
> > --
> > Regards,
> > Madhukara Phatak
> > http://www.madhukaraphatak.com
>

Mime
View raw message