hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy <yuriythe...@gmail.com>
Subject Re: How to serialize very large object in Hadoop Writable?
Date Fri, 22 Aug 2014 22:30:09 GMT
Thank you, Alexander. That, at least, explains the problem. And what should
be the workaround if the combined set of data is larger than 2 GB?

On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com>

> Max array size is max integer. So, byte array can not be bigger than 2GB
> On Aug 22, 2014 1:41 PM, "Yuriy" <yuriythedev@gmail.com> wrote:
>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>> which uses a simple array under the cover.
>> When I try to write a lot of data in DataOutput in my reducer, I get:
>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>> Looks like the system is unable to allocate the continuous array of the
>> requested size. Apparently, increasing the heap size available to the
>> reducer does not help - it is already at 84GB (-Xmx84G)
>> If I cannot reduce the size of the object that I need to serialize (as
>> the reducer constructs this object by combining the object data), what
>> should I try to work around this problem?
>> Thanks,
>> Yuriy

View raw message