xmlgraphics-batik-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DeWeese Thomas <thomas.dewe...@gmail.com>
Subject Re: Apache batik memory issues
Date Mon, 14 May 2012 22:59:56 GMT
Hi Hilbert,

On May 14, 2012, at 2:33 AM, Hilbert Mostert wrote:

> Thanks for paying attention to this. I've measured the values using Runtime.getRuntime().freeMemory()
and friends. Indeed the process size is very misleading because in one case we had a process
size of 5 GB while the free memory was more than 3Gb.

	Have you tried explicitly triggering a garbage collection?  Often this does nothing but if
you call a few dozen times sometimes you can get it to do something :)
A more useful way to tell if you really have a leak is to have a graph of memory usage over
time.  There is one we use in Squiggle that can be useful to watch.
Basically anytime you do anything memory usage will grow so what you look for is where the
memory lands after a full GC which will occur every now and then.

> I do have large images but it not a large number, about three and for every student an
QR code which are about 150bytes (png) each. These are embedded as base64 encoded data.

	Do you have any idea if the images can be shared between the documents or not?  One thing
about the PDF transcoder is that it often ends up rasterizing the images which depending on
what it thinks it can get away with can cause the images to grow quite considerably.  Are
the PDFs you generate on the large side?

> I am not using much features of Batik, it is mainly replacing text content in specific
SVG elements and then converting them into PDF using the PDFTransposer provided in the Apache
Batik 1.7 package.
> 
> What I have noticed is that there is a CleanerThread created when I start generating
PDF files but it never runs, it is always in waiting state. Is there a command which triggers
this thread?

	The cleaner thread is used to clear out caches when soft referenced objects are cleared by
a run of the garbage collector.  It may just mean that your images aren't generating much
cached stuff or it may mean that the garbage collector hasn't felt the need to be particularly
agressive about clearing out memory (although I would have thought that by the time you reach
5GB it would have felt the need a few times).

	Also does it just use a lot of memory or does the memory usage grow consistently over time?
So no matter how large you set the heap eventually it runs out of memory.

	Thomas

> 
> On 05/12/2012 07:17 PM, DeWeese Thomas wrote:
>> Hi Hilbert,
>> 
>> 	How are you measuring how much memory you are using after step 4?  If you are just
looking at process size that can
>> be very misleading since typically the JVM will grow and even if the JVM has freed
most of the memory it will hold onto the
>> larger memory block, partially since it may be fragmented and partially since it
may need the memory again shortly.
>> 
>> 	There are caches in Batik for documents and images and other assets but unless you
have a lot of large images it
>> is unlikely they would reach 1Gb.  Filter effects may also cache some intermediate
results but typically those will be cleaned
>> when the filter is disposed of (which given the lazy nature of the JVM may not happen
for a while).  A general outline of the
>> features you are using from SVG might help identify areas that might be responsible
for the memory bloat.
>> 
>> 	Which by the way raises the other issue are you forcing a GC?  If not lots of currently
unused stuff will hang out until the
>> memory is needed for something else.  Finally remember that just calling for a single
GC doesn't typically do much to clear out memory.
>> 
>> 	Thomas
>> 
>> On May 11, 2012, at 8:39 AM, Hilbert Mostert wrote:
>> 
>>> Recently I have started using Apache Batik to create PDF files from SVG templates.
The application is used to generate exam pages for students. It works great but it uses a
huge amount of memory. This is sometimes annoying because i have to increase the memory limit
to over 4Gb to have it complete the task. There are in general lots of students (500+) and
in one case 2000+ students. This will, of course, eat memory like an elephant I accept that.
>>> 
>>> I want to reduce this memory footprint and have found one issue in my program
where I need help with.
>>> 
>>> I am using the Java JRE 1.6.0_32, Batik 1.7 and PDFBox 1.6.0.
>>> 
>>> The program has the following flow:
>>> 
>>> 1.    fetch students from source (Excel file)
>>> 2.    create workers to generate pdf from svg
>>> 3.    while not all students have been processed do
>>> 3.1      replace information in svg document ( using w3c functions from Document
class  ) (this is done by worker)
>>> 3.2      generate PDF from svg document (Using PDFTranscoder)
>>> 3.3      check if there are more students; true: goto 3.1; false: continue with
step 4
>>> 4.   clean up workers
>>> 5.   generate single pdf from all generated pdfs using PDFBox
>>> 6.   done
>>> 
>>> It is a multi threaded environment and all the workers are in their own thread,
each worker has a copy of the svg document, they dont share anything (for obvious reasons).
>>> 
>>> What I have found  is what comes after step 4, after cleaning up the workers
I am still using 1Gb of memory which is much more than when I start (around 128Mb). I suspect
there is some caching here and there but I have not enough knowledge from batik to fix this
problem.
>>> 
>>> Who can help me or has the answer for me?
>>> 
>>> 
>>> Thanks in advance,
>>> 
>>> Hilbert Mostert
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: batik-users-unsubscribe@xmlgraphics.apache.org
>>> For additional commands, e-mail: batik-users-help@xmlgraphics.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: batik-users-unsubscribe@xmlgraphics.apache.org
>> For additional commands, e-mail: batik-users-help@xmlgraphics.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: batik-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: batik-users-help@xmlgraphics.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-help@xmlgraphics.apache.org


Mime
View raw message