Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5669517B97 for ; Tue, 22 Sep 2015 11:28:26 +0000 (UTC) Received: (qmail 33627 invoked by uid 500); 22 Sep 2015 11:28:23 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 33541 invoked by uid 500); 22 Sep 2015 11:28:23 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 33530 invoked by uid 99); 22 Sep 2015 11:28:23 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2015 11:28:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C74DFC0CDC for ; Tue, 22 Sep 2015 11:28:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id ttSfb9UN2Lhc for ; Tue, 22 Sep 2015 11:28:18 +0000 (UTC) Received: from forward13o.cmail.yandex.net (forward13o.cmail.yandex.net [37.9.109.182]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 5980B209B7 for ; Tue, 22 Sep 2015 11:28:17 +0000 (UTC) Received: from smtp3o.mail.yandex.net (smtp3o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::27]) by forward13o.cmail.yandex.net (Yandex) with ESMTP id CD60021226 for ; Tue, 22 Sep 2015 14:28:09 +0300 (MSK) Received: from smtp3o.mail.yandex.net (localhost [127.0.0.1]) by smtp3o.mail.yandex.net (Yandex) with ESMTP id 9235B1E0277 for ; Tue, 22 Sep 2015 14:28:09 +0300 (MSK) Received: by smtp3o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id pPWY6F8I3H-S8KCQ2JE; Tue, 22 Sep 2015 14:28:08 +0300 (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (Client certificate not present) X-Yandex-ForeignMX: US From: =?utf-8?Q?Yusuf_Can_G=C3=BCrkan?= Content-Type: multipart/alternative; boundary="Apple-Mail=_09E20648-C4DC-4BA0-9DFD-1D590C44929D" Subject: Heap Space Error Message-Id: <393AC797-D24D-4354-B8AC-156DB5038481@useinsider.com> Date: Tue, 22 Sep 2015 14:28:07 +0300 To: user Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) X-Mailer: Apple Mail (2.1993) --Apple-Mail=_09E20648-C4DC-4BA0-9DFD-1D590C44929D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I run the code below and getting error: val dateUtil =3D new DateUtil() val usersInputDF =3D sqlContext.sql( s""" | select userid,concat_ws(' ',collect_list(concat_ws(' = ',if(productname is not = NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(pro= ductcategory,2,length(productcategory)-2),'\"',''),\",\",' '))))) = inputlist from landing where = dt=3D'${dateUtil.getYear}-${dateUtil.getMonth}' and userid !=3D '' and = userid is not null and userid is not NULL and pagetype =3D = 'productDetail' group by userid """.stripMargin) usersInputDF.registerTempTable("users_product_visits") sqlContext.sql("cache table users_product_visits") ERROR: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at = java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) One of the task=E2=80=99s shuffle read size is always much more than = others as you can see below. What can cause this? My table above is an = external table which source is S3. --Apple-Mail=_09E20648-C4DC-4BA0-9DFD-1D590C44929D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
I run the code below and getting = error:

val dateUtil =3D new =
DateUtil()

val usersInputDF =3D sqlContext.sql(
= s"""
| = select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname is = not = NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(pro= ductcategory,2,length(productcategory)-2),'\"',''),\",\",' '))))) = inputlist from landing where dt=3D'${dateUtil.getYear}-${dateUtil.getMonth}' and = userid !=3D '' and userid is not null and userid is not NULL and = pagetype =3D 'productDetail' group by userid

""".stripMargin)

usersInputDF.registerTempTable("users_product_visits")

sqlContext.sql("cache= table users_product_visits")

ERROR:

java.lang.OutOfMemoryError: Requested array size exceeds VM = limit
at = java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)



One = of the task=E2=80=99s shuffle read size is always much more than others = as you can see below. What can cause this? My table above is an external = table which source is S3.


= --Apple-Mail=_09E20648-C4DC-4BA0-9DFD-1D590C44929D--