Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8524611501 for ; Fri, 22 Aug 2014 20:41:57 +0000 (UTC) Received: (qmail 73905 invoked by uid 500); 22 Aug 2014 20:41:53 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 73793 invoked by uid 500); 22 Aug 2014 20:41:53 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73783 invoked by uid 99); 22 Aug 2014 20:41:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 20:41:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuriythedev@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 20:41:25 +0000 Received: by mail-qa0-f41.google.com with SMTP id j7so10326298qaq.28 for ; Fri, 22 Aug 2014 13:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=jAJHmwamZ6YHj7AxZtJX0Wul00nrMcrbEvz0rWv5Smk=; b=MeQV7JMV7081TBuaNcvt/Vqgj5aD/G+Ud6gnzw9JmOmM8+pma2lXgmO94vrqpyJvpp D+TeUZduC1PAwEKmF9HQeRSXkk7SMOLK9hHs+6PVNTYvrLR0xbIIPox9exA354p+ZgBK OPVXVZeuqkH+j/gXG/eCM7YSzPaY4iOdNhoPE66qPSg+jueIfZCE39uQOvTK3e3tQAhX 3+lxrPwwJE3R+VCm+mRi/x9fFkYvT87Nadwe2EqmznJ9wJ7tHmja2+3q9aNORAlX4+sv /o21NDic4hV8gW26aqMZRI3x7J/aisN4Qfc2gCd+GQifrQjZ1WAlGMEVo63stIaTZ1K5 YepQ== MIME-Version: 1.0 X-Received: by 10.224.55.19 with SMTP id s19mr10023523qag.55.1408740084384; Fri, 22 Aug 2014 13:41:24 -0700 (PDT) Received: by 10.224.200.3 with HTTP; Fri, 22 Aug 2014 13:41:24 -0700 (PDT) Date: Fri, 22 Aug 2014 13:41:24 -0700 Message-ID: Subject: How to serialize very large object in Hadoop Writable? From: Yuriy To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0149548c4f883705013ddcd1 X-Virus-Checked: Checked by ClamAV on apache.org --089e0149548c4f883705013ddcd1 Content-Type: text/plain; charset=UTF-8 Hadoop Writable interface relies on "public void write(DataOutput out)" method. It looks like behind DataOutput interface, Hadoop uses DataOutputStream, which uses a simple array under the cover. When I try to write a lot of data in DataOutput in my reducer, I get: Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3230) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.FilterOutputStream.write(FilterOutputStream.java:97) Looks like the system is unable to allocate the continuous array of the requested size. Apparently, increasing the heap size available to the reducer does not help - it is already at 84GB (-Xmx84G) If I cannot reduce the size of the object that I need to serialize (as the reducer constructs this object by combining the object data), what should I try to work around this problem? Thanks, Yuriy --089e0149548c4f883705013ddcd1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hadoop Writable interface relies on=C2=A0"public void writ= e(DataOutput out)"=C2=A0method. It looks like behind DataOutput= interface, Hadoop uses DataOutputStream, which uses a simple array under t= he cover.

When I try to write a lot of data in DataOutput in my reducer, I get:

Caused by: java.lang.OutOfMemoryError= : Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.j= ava:3230) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:= 113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.= java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:= 140) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.i= o.FilterOutputStream.write(FilterOutputStream.java:97)

Looks like the system is unable to allocate the continuous array of the req= uested size. Apparently, increasing the heap size available to the reducer = does not help - it is already at 84GB (-Xmx84G)

If I cannot reduce the size of the object that I need to serialize (as the = reducer constructs this object by combining the object data), what should I= try to work around this problem?

Thanks,

Yuriy

--089e0149548c4f883705013ddcd1--