spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MEETHU MATHEW <meethu2...@yahoo.co.in>
Subject Re: OutOfMemory Error
Date Wed, 20 Aug 2014 08:48:12 GMT


 Hi ,

How to increase the heap size?

What is the difference between spark executor memory and heap size?

Thanks & Regards, 
Meethu M


On Monday, 18 August 2014 12:35 PM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
 


I believe spark.shuffle.memoryFraction is the one you are looking for.

spark.shuffle.memoryFraction : Fraction of Java heap to use for aggregation and cogroups
during shuffles, if spark.shuffle.spill is true. At any given time, the collective size
of all in-memory maps used for shuffles is bounded by this limit, beyond which the contents
will begin to spill to disk. If spills are often, consider increasing this value at the expense
of spark.storage.memoryFraction.


You can give it a try.



Thanks
Best Regards


On Mon, Aug 18, 2014 at 12:21 PM, Ghousia <ghousia.atheeq@gmail.com> wrote:

Thanks for the answer Akhil. We are right now getting rid of this issue by increasing the
number of partitions. And we are persisting RDDs to DISK_ONLY. But the issue is with heavy
computations within an RDD. It would be better if we have the option of spilling the intermediate
transformation results to local disk (only in case if memory consumption is high)  . Do we
have any such option available with Spark? If increasing the partitions is the only the way,
then one might end up with OutOfMemory Errors, when working with certain algorithms where
intermediate result is huge.
>
>
>
>
>On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
>
>Hi Ghousia,
>>
>>
>>You can try the following:
>>
>>
>>1. Increase the heap size
>>>2. Increase the number of partitions
>>>3. You could try persisting the RDD to use DISK_ONLY
>>
>>
>>
>>
>>Thanks
>>Best Regards
>>
>>
>>
>>On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj <ghousia.atheeq@gmail.com> wrote:
>>
>>Hi,
>>>
>>>I am trying to implement machine learning algorithms on Spark. I am working
>>>on a 3 node cluster, with each node having 5GB of memory. Whenever I am
>>>working with slightly more number of records, I end up with OutOfMemory
>>>Error. Problem is, even if number of records is slightly high, the
>>>intermediate result from a transformation is huge and this results in
>>>OutOfMemory Error. To overcome this, we are partitioning the data such that
>>>each partition has only a few records.
>>>
>>>Is there any better way to fix this issue. Some thing like spilling the
>>>intermediate data to local disk?
>>>
>>>Thanks,
>>>Ghousia.
>>>
>>>
>>>
>>>--
>>>View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
>>>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
Mime
View raw message