spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinchu Sup <chinchu....@gmail.com>
Subject Re: GroupBy Key and then sort values with the group
Date Thu, 09 Oct 2014 23:28:53 GMT
Thanks Davies.. I'll try it when it gets released (I am on 1.1.0
currently). For now I am using a custom partitioner with the ShuffleRDD()
to keep the same groups together, so I don't have to shuffle all data to a
single partition.

On Thu, Oct 9, 2014 at 2:34 PM, Davies Liu <davies@databricks.com> wrote:

> There is a new API called repartitionAndSortWithinPartitions() in
> master, it may help in this case,
> then you should do the `groupBy()` by yourself.
>
> On Wed, Oct 8, 2014 at 4:03 PM, chinchu <chinchu.sup@gmail.com> wrote:
> > Sean,
> >
> > I am having a similar issue, but I have a lot of data for a group & I
> cannot
> > materialize the iterable into a List or Seq in memory. [I tried & it runs
> > into OOM]. is there any other way to do this ?
> >
> > I also tried a secondary-sort, with the key having the "group::time", but
> > the problem with that is the same group-name ends up in multiple
> partitions
> > & I am having to run sortByKey with one partition - sortByKey(true, 1)
> which
> > shuffles a lot of data..
> >
> > Thanks,
> > -C
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GroupBy-Key-and-then-sort-values-with-the-group-tp14455p15990.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>

Mime
View raw message