spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Parquet problems
Date Wed, 22 Jul 2015 15:28:47 GMT
How many columns are there in these Parquet files? Could you load a 
small portion of the original large dataset successfully?

Cheng

On 6/25/15 5:52 PM, Anders Arpteg wrote:
>
> Yes, both the driver and the executors. Works a little bit better with 
> more space, but still a leak that will cause failure after a number of 
> reads. There are about 700 different data sources that needs to be 
> loaded, lots of data...
>
>
> tor 25 jun 2015 08:02 Sabarish Sasidharan 
> <sabarish.sasidharan@manthan.com 
> <mailto:sabarish.sasidharan@manthan.com>> skrev:
>
>     Did you try increasing the perm gen for the driver?
>
>     Regards
>     Sab
>
>     On 24-Jun-2015 4:40 pm, "Anders Arpteg" <arpteg@spotify.com
>     <mailto:arpteg@spotify.com>> wrote:
>
>         When reading large (and many) datasets with the Spark 1.4.0
>         DataFrames parquet reader (the org.apache.spark.sql.parquet
>         format), the following exceptions are thrown:
>
>         Exception in thread "sk-result-getter-0"
>         Exception: java.lang.OutOfMemoryError thrown from the
>         UncaughtExceptionHandler in thread "task-result-getter-0"
>         Exception in thread "task-result-getter-3"
>         java.lang.OutOfMemoryError: PermGen space
>         Exception in thread "task-result-getter-1"
>         java.lang.OutOfMemoryError: PermGen space
>         Exception in thread "task-result-getter-2"
>         java.lang.OutOfMemoryError: PermGen space
>
>         and many more like these from different threads. I've tried
>         increasing the PermGen space using the -XX:MaxPermSize VM
>         setting, but even after tripling the space, the same errors
>         occur. I've also tried storing intermediate results, and am
>         able to get the full job completed by running it multiple
>         times and starting for the last successful intermediate
>         result. There seems to be some memory leak in the parquet
>         format. Any hints on how to fix this problem?
>
>         Thanks,
>         Anders
>


Mime
View raw message