avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ey-chih chow <eyc...@hotmail.com>
Subject RE: avro object reuse
Date Wed, 01 Jun 2011 18:34:26 GMT

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:
	24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640
org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059
8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:
	211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:
	105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578
5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782
4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:
	101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160
java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.
 Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely
due to object reuse.  This tends to cause more CPU time in the garbage collector, but not
out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo'
output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly
identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <eychih@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat
and got the following question.  Why a new Utf8 object has to be created each time the method
next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much
memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create
one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

We have several mapreduce jobs using avro.  They take too much memory when running on production.
 Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow 		 	   		   		 	   		  
View raw message