spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yusuf Can Gürkan <yu...@useinsider.com>
Subject Heap Space Error
Date Tue, 22 Sep 2015 11:28:07 GMT
I run the code below and getting error:

val dateUtil = new DateUtil()

val usersInputDF = sqlContext.sql(
  s"""
     |  select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname is not NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",'
'))))) inputlist from landing where dt='${dateUtil.getYear}-${dateUtil.getMonth}' and userid
!= '' and userid is not null and userid is not NULL and pagetype = 'productDetail' group by
userid

   """.stripMargin)

usersInputDF.registerTempTable("users_product_visits")

sqlContext.sql("cache table users_product_visits")

ERROR:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)



One of the task’s shuffle read size is always much more than others as you can see below.
What can cause this? My table above is an external table which source is S3.



Mime
View raw message