hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Pouttu-Clarke" <Matt.Pouttu-Cla...@icrossing.com>
Subject Re: load a serialized object in hadoop
Date Wed, 13 Oct 2010 15:48:56 GMT
Also, serialization often keeps previously read object references in  
memory.  Better to use Thrift or Avro to serialize the object.

In my experience serialization is inefficient for large object graphs,  
but works fine for smaller graphs (depending on how much memory you  
have to work with).

Also for that small of data memcache and mongo may be overkill (unless  
the data changes frequently)

Cheers,
Matt

On Oct 13, 2010, at 11:04 AM, "Shi Yu" <shiyu@uchicago.edu> wrote:

> As a coming-up to the my own question, I think to invoke the JVM in  
> Hadoop requires much more memory than an ordinary JVM. I found that  
> instead of serialization the object, maybe I could create a MapFile  
> as an index to permit lookups by key in Hadoop. I have also compared  
> the performance of MongoDB and Memcache. I will let you know the  
> result after I try the MapFile approach.
>
> Shi
>
> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>
>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<shiyu@uchicago.edu>  wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I want to load a serialized HashMap object in hadoop. The file of  
>>>> stored
>>>> object is 200M. I could read that object efficiently in JAVA by  
>>>> setting
>>>>
>>> -Xmx
>>>
>>>> as 1000M.  However, in hadoop I could never load it into memory.  
>>>> The code
>>>>
>>> is
>>>
>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>
>>> map/reduce
>>>
>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still  
>>>> get the
>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone  
>>>> explain a
>>>>
>>> little
>>>
>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up  
>>>> so much
>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>
>>> memory
>>>
>>>> it requires (generally) in Hadoop?
>>>>
>>>
>> The JVM reserves swap space in advance, at the time of launching the
>> process. If your swap is too low (or do not have any swap  
>> configured), you
>> will hit this.
>>
>> Or, you are on a 32-bit machine, in which case 3G is not possible  
>> in the
>> JVM.
>>
>> -Srivas.
>>
>>
>>
>>
>>>> Thanks.
>>>>
>>>> Shi
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>
>>
>
>

iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential
and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by reply email
and destroy all copies of the original message.



Mime
View raw message