spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Collect method in Spark
Date Fri, 07 Nov 2014 05:27:50 GMT
Once you do a .collect, it will bring the data from the worker machines to
the master node. And if the dataset is too huge, then the master node will
go down.

This will return an array of ((key, 0)

*val rdd2 = rdd1.mapValues(v => 0).collect*




Thanks
Best Regards

On Fri, Nov 7, 2014 at 10:41 AM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:

> Hi,
>
> The collect method returns an Array. If I have a huge set of data and I do
> something like the following:
>
> *val rdd2 = rdd1.mapValues(v => 0).collect *//where rdd1 is some
> key-value pair RDD
>
> As per my understanding, this will return an array(String, Int) and if my
> data is huge this will return a huge array.
>
> Will there not be any problems pertaining to memory if I do this? In other
> words, for a huge dataset, collect will create a huge array. Can there
> arise any memory issues if I use collect?
>
> Thank You
>

Mime
View raw message