hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: How to serialize very large object in Hadoop Writable?
Date Fri, 22 Aug 2014 23:00:53 GMT
Usually Hadoop Map Reduce deals with row based data.
ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

if you need to write a lot to hdfs file you can get OutputStream to hdfs
file and write bytes.


On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yuriythedev@gmail.com> wrote:

> Thank you, Alexander. That, at least, explains the problem. And what
> should be the workaround if the combined set of data is larger than 2 GB?
>
>
> On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Max array size is max integer. So, byte array can not be bigger than 2GB
>> On Aug 22, 2014 1:41 PM, "Yuriy" <yuriythedev@gmail.com> wrote:
>>
>>>  Hadoop Writable interface relies on "public void write(DataOutput out)" method.
>>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
>>> which uses a simple array under the cover.
>>>
>>> When I try to write a lot of data in DataOutput in my reducer, I get:
>>>
>>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM
>>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at
>>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
>>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
>>> java.io.DataOutputStream.write(DataOutputStream.java:107) at
>>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>>
>>> Looks like the system is unable to allocate the continuous array of the
>>> requested size. Apparently, increasing the heap size available to the
>>> reducer does not help - it is already at 84GB (-Xmx84G)
>>>
>>> If I cannot reduce the size of the object that I need to serialize (as
>>> the reducer constructs this object by combining the object data), what
>>> should I try to work around this problem?
>>>
>>> Thanks,
>>>
>>> Yuriy
>>>
>>
>

Mime
View raw message