lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Updated: (SOLR-857) Memory Leak during the indexing of large xml files
Date Fri, 21 Nov 2008 11:41:57 GMT
How many unique fields do all of the xml files contain (even approx)?


Ruben Jimenez (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/SOLR-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>
> Ruben Jimenez updated SOLR-857:
> -------------------------------
>
>     Attachment: solr.zip
>
> I did some poking around and think I pinpointed the source of the memory problems.  I
took Bill's advice to look for a large HashMap so I walked through each of them in the heapdump
and found that there were 16 rather large HashMaps.  Each of these had a size of over 120,000.
 Upon further inspection I also found that these 16 Maps seem to be three distinct Maps. 
I came to this conclusion by looking at the first item found in each to group them initially
and then confirmed this by choosing two additional random locations per group to verify the
each map location contained the same object.  See FieldInfoSample.PNG to see a sample.
>
> This may not be related but I did a Google Search for lucene fieldinfos leak 2008 and
the following came up:  http://mail-archives.apache.org/mod_mbox/lucene-java-commits/200809.mbox/%3C20080914103317.D6AFB2388A0F@eris.apache.org%3E
>
> Assuming I'm unable to find a way to reproduce this error without a rather large number
of these files should I just start zipping them and uploading one at a time?
>
>   
>> Memory Leak during the indexing of large xml files
>> --------------------------------------------------
>>
>>                 Key: SOLR-857
>>                 URL: https://issues.apache.org/jira/browse/SOLR-857
>>             Project: Solr
>>          Issue Type: Bug
>>    Affects Versions: 1.3
>>         Environment: Verified on Ubuntu 8.0.4 (1.7GB RAM, 2.4GHz dual core) and Windows
XP (2GB RAM, 2GHz pentium) both with a Java5 SDK
>>            Reporter: Ruben Jimenez
>>         Attachments: OQ_SOLR_00001.xml.zip, schema.xml, solr.zip, solr256MBHeap.jpg
>>
>>
>> While indexing a set of SOLR xml files that contain 5000 document adds within them
and are about 30MB each, SOLR 1.3 seems to continually use more and more memory until the
heap is exhausted, while the same files are indexed without issue with SOLR 1.2.
>> Steps used to reproduce.
>> 1 - Download SOLR 1.3
>> 2 - Modify example schema.xml to match fields required
>> 3 - start example server with following command java -Xms512m -Xmx1024m -XX:MaxPermSize=128m
-jar start.jar
>> 4 - Index files as follow java -Xmx128m -jar .../examples/exampledocs/post.jar *.xml
>> Directory with xml files contains about 100 xml files each of about 30MB each.  While
indexing after about the 25th file SOLR 1.3 runs out of memory, while SOLR 1.2 is able to
index the entire set of files without any problems.
>>     
>
>   


Mime
View raw message