hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy <yuriythe...@gmail.com>
Subject How to serialize very large object in Hadoop Writable?
Date Fri, 22 Aug 2014 20:41:24 GMT
Hadoop Writable interface relies on "public void write(DataOutput out)" method.
It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
which uses a simple array under the cover.

When I try to write a lot of data in DataOutput in my reducer, I get:

Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
limit at java.util.Arrays.copyOf(Arrays.java:3230) at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
java.io.DataOutputStream.write(DataOutputStream.java:107) at

Looks like the system is unable to allocate the continuous array of the
requested size. Apparently, increasing the heap size available to the
reducer does not help - it is already at 84GB (-Xmx84G)

If I cannot reduce the size of the object that I need to serialize (as the
reducer constructs this object by combining the object data), what should I
try to work around this problem?



View raw message