spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pralabh Kumar <pralabhku...@gmail.com>
Subject Re: Memory issue in pyspark for 1.6 mb file
Date Sun, 18 Jun 2017 02:54:44 GMT
Hi Naga

Is it failing because of driver memory full or executor  memory full ?

can you please try setting this property spark.cleaner.ttl ? . So that
older RDDs /metadata should also get clear automatically.

Can you please provide the complete error stacktrace and code snippet ?.


Regards
Pralabh Kumar



On Sun, Jun 18, 2017 at 12:06 AM, Naga Guduru <gudurunaga@gmail.com> wrote:

> Hi,
>
> I am trying to load 1.6 mb excel file which has 16 tabs. We converted
> excel to csv and loaded 16 csv files to 8 tables. Job was running
> successful in 1st run in pyspark. When trying to run the same job 2 time,
> container getting killed due to memory issues.
>
> I am using unpersist and clearcache on all rdds and dataframes after each
> file loaded into table. Each csv file is loaded in sequence process ( for
> loop) as some of the files should go to same table. Job will run 15 min if
> it was success and 12-15 min if it was failed. If i increase the driver
> memory and executor memory to more than 5 gb, its getting success.
>
> My assumption is driver memory full, and unpersist clear cache not working.
>
> Error: physical memory of 2 gb used and virtual memory of 4.6 gb used.
>
> Spark 1.6 version running in Cloudera Enterprise .
>
> Please let me know, if you need any info.
>
>
> Thanks
>
>

Mime
View raw message