Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DBBB6994 for ; Wed, 1 Jun 2011 22:27:01 +0000 (UTC) Received: (qmail 33911 invoked by uid 500); 1 Jun 2011 22:27:00 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 33869 invoked by uid 500); 1 Jun 2011 22:27:00 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 33861 invoked by uid 99); 1 Jun 2011 22:27:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 22:27:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of scott@richrelevance.com designates 64.78.17.19 as permitted sender) Received: from [64.78.17.19] (HELO EXHUB018-4.exch018.msoutlookonline.net) (64.78.17.19) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 22:26:54 +0000 Received: from EXVMBX018-1.exch018.msoutlookonline.net ([64.78.17.47]) by EXHUB018-4.exch018.msoutlookonline.net ([64.78.17.19]) with mapi; Wed, 1 Jun 2011 15:26:34 -0700 From: Scott Carey To: "user@avro.apache.org" Date: Wed, 1 Jun 2011 15:27:08 -0700 Subject: Re: avro object reuse Thread-Topic: avro object reuse Thread-Index: AcwgqvaikVgeLGMnTN2bd4GRWjO5og== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.10.0.110310 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CA0C097C3AA95scottrichrelevancecom_" MIME-Version: 1.0 --_000_CA0C097C3AA95scottrichrelevancecom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable no, and even GenericData.Record simply writes using a StringBuilder; I doub= t this is the culprit. On 6/1/11 3:14 PM, "ey-chih chow" > wrote: We use a lot of toString() call on the avro Utf8 object. Will this cause J= ackson call? Thanks. Ey-Chih ________________________________ From: scott@richrelevance.com To: user@avro.apache.org Date: Wed, 1 Jun 2011 13:38:39 -0700 Subject: Re: avro object reuse This is great info. Jackson should only be used once when the file is opened, so this is confus= ing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequen= tly? There are over 100000 Jackson DeserializationConfig objects. Another place that parses the schema is in AvroSerialization.java. Does th= e Hadoop getDeserializer() API method get called once per job, or per recor= d? If this is called more than once per map job, it might explain this. In principle, Jackson is only used by a mapper during initialization. The = below indicates that this may not be the case or that something outside of = Avro is causing a lot of Jackson JSON parsing. Are you using something that is converting the Avro data to Json form? toS= tring() on most Avro datum objects will do a lot of work with Jackson, for = example =97 but the below are deserializer objects not serializer objects s= o that is not likely the issue. On 6/1/11 11:34 AM, "ey-chih chow" > wrote: We ran jmap on one of our mapper and found the top usage as follows: num #instances #bytes Class description -------------------------------------------------------------------------- 1: 24405 291733256 byte[] 2: 6056 40228984 int[] 3: 388799 19966776 char[] 4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser 5: 369623 11827936 java.lang.String 6: 111059 8769424 java.util.HashMap$Entry[] 7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext 8: 211374 6763968 java.util.HashMap$Entry 9: 102551 5742856 org.codehaus.jackson.util.TextBuffer 10: 105854 5080992 java.nio.HeapByteBuffer 11: 105821 5079408 java.nio.HeapCharBuffer 12: 104578 5019744 java.util.HashMap 13: 102551 4922448 org.codehaus.jackson.io.IOContext 14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig 15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer 16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext 17: 101779 4071160 java.io.StringReader 18: 101754 4070160 java.util.HashMap$KeyIterator It looks like Jackson eats up a lot of memory. Our mapper reads in files o= f the avro format. Does avro use Jackson a lot in reading the avro files? = Is there any way to improve this? Thanks. Ey-Chih Chow ________________________________ From: scott@richrelevance.com To: user@avro.apache.org Date: Tue, 31 May 2011 18:26:23 -0700 Subject: Re: avro object reuse All of those instances are short-lived. If you are running out of memory,= its not likely due to object reuse. This tends to cause more CPU time in = the garbage collector, but not out of memory conditions. This can be hard = to do on a cluster, but grabbing 'jmap =96histo' output from a JVM that has= a larger-than-expected JVM heap usage can often be used to quickly identif= y the cause of memory consumption issues. I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8= or not. On 5/31/11 5:40 PM, "ey-chih chow" > wrote: I actually looked into Avro code to find out how Avro does object reuse. I= looked at AvroUtf8InputFormat and got the following question. Why a new U= tf8 object has to be created each time the method next(AvroWrapper ke= y, NullWritable value) is called ? Will this eat up too much memory when w= e call next(key, value) many times? Since Utf8 is mutable, can we just cre= ate one Utf8 object for all the calls to next(key, value)? Will this save = memory? Thanks. Ey-Chih Chow ________________________________ From: eychih@hotmail.com To: user@avro.apache.org Subject: avro object reuse Date: Tue, 31 May 2011 10:38:39 -0700 Hi, We have several mapreduce jobs using avro. They take too much memory when = running on production. Can anybody suggest some object reuse techniques to= cut down memory usage? Thanks. Ey-Chih Chow --_000_CA0C097C3AA95scottrichrelevancecom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
no, and even GenericData.Record= simply writes using a StringBuilder; I doubt this is the culprit.

On 6/1/11 3:14 PM, = "ey-chih chow" <eychih@h= otmail.com> wrote:

We use a lot of toString() call on the avro Utf8 object.  Will this ca= use Jackson call?  Thanks.

Ey-Chih 

From: scott= @richrelevance.com
To: user@= avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re:= avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confu= sing from that point of view.  
Is something else using Jack= son or initializing an Avro JsonDecoder frequently?  There are over 10= 0000 Jackson DeserializationConfig objects.

Anothe= r place that parses the schema is in AvroSerialization.java.  Does the= Hadoop getDeserializer() API method get called once = per job, or per record?  If this is called more than once per map job,= it might explain this.

In principle, Jackson is o= nly used by a mapper during initialization.  The below indicates that = this may not be the case or that something outside of Avro is causing a lot= of Jackson JSON parsing. 

Are you using some= thing that is converting the Avro data to Json form?  toString() on mo= st Avro datum objects will do a lot of work with Jackson, for example =97 b= ut the below are deserializer objects not serializer objects so that is not= likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <eychih@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num  #instances #bytes Class description
-----------------------= ---------------------------------------------------
1: 24405 291733256 byte[]
2: 605= 6 402289= 84 int[]=
3: = 388799 <= /span>19966776 = char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5= : 36962= 3 118279= 36 java.= lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.J= sonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.= TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codeh= aus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsT= oNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationC= ontext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
=

It looks like Jackson eats up a lot of memory.  Ou= r mapper reads in files of the avro format.  Does avro use Jackson a l= ot in reading the avro files?  Is there any way to improve this?  = ;Thanks.

Ey-Chih Chow


From: scott@richrelevanc= e.com
To: user@avro.apache.o= rg
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object= reuse

All of those instances are short-lived.   If you ar= e running out of memory, its not likely due to object reuse.  This ten= ds to cause more CPU time in the garbage collector, but not out of memory c= onditions.  This can be hard to do on a cluster, but grabbing 'jmap = =96histo' output from a JVM that has a larger-than-expected JVM heap usage = can often be used to quickly identify the cause of memory consumption issue= s.

I'm not sure if AvroUtf8InputFormat can safely = re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-ch= ih chow" <eychih@hotmail.com<= /a>> wrote:

I actually looked into Avro code to find out how Avro does object reuse. &n= bsp;I looked at AvroUtf8InputFormat and got the following question.  W= hy a new Utf8 object has to be created each time the method next(AvroWrappe= r<Utf8> key, NullWritable value) is called ?  Will this eat up t= oo much memory when we call next(key, value) many times?  Since Utf8 i= s mutable, can we just create one Utf8 object for all the calls to next(key= , value)?  Will this save memory?  Thanks.

Ey-= Chih Chow 


From:
eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Dat= e: Tue, 31 May 2011 10:38:39 -0700

Hi, 

We have several mapreduce jobs using avro. &nb= sp;They take too much memory when running on production.  Can anybody = suggest some object reuse techniques to cut down memory usage?  Thanks= .

Ey-Chih Chow
=
--_000_CA0C097C3AA95scottrichrelevancecom_--