lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Date Fri, 04 Apr 2014 21:34:40 GMT
Hi,

Can you remove auto commit for bulk import. Commit at the very end?

Ahmet



On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <candygram.for.mongo@gmail.com>
wrote:
In case the attached database.xml file didn't show up, I have pasted in the
contents below:

<dataConfig>
<dataSource
name="org_only"
type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
user="admin"
password="admin"
readOnly="false"
batchSize="100"
/>
<document>


<entity name="full-index" query="
select

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORCL.ADDRESS_ACCT_ALL'
as SOLR_CATEGORY,

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
ADDRESSALLCITY,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORCL.ADDRESS_ACCT_ALL
" >

<field column="SOLR_ID" name="id" />
<field column="SOLR_CATEGORY" name="category" />
<field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" />
<field column="ADDRESSALLADDRTYPECD"
name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
<field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc" />
<field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" />
<field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" />
<field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" />
<field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" />
<field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc"
/>

</entity>



<!-- Varaibles -->
<!-- '${dataimporter.last_index_time}' -->
</document>
</dataConfig>






On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
candygram.for.mongo@gmail.com> wrote:

> In this case we are indexing an Oracle database.
>
> We do not include the data-config.xml in our distribution.  We store the
> database information in the database.xml file.  I have attached the
> database.xml file.
>
> When we use the default merge policy settings, we get the same results.
>
>
>
> We have not tried to dump the table to a comma separated file.  We think
> that dumping this size table to disk will introduce other memory problems
> with big file management. We have not tested that case.
>
>
> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
>> Hi,
>>
>> Which database are you using? Can you send us data-config.xml?
>>
>> What happens when you use default merge policy settings?
>>
>> What happens when you dump your table to Comma Separated File and fed
>> that file to solr?
>>
>> Ahmet
>>
>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> candygram.for.mongo@gmail.com> wrote:
>>
>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> system crash sooner.  In production that tag is commented out which
>> I believe forces the default value to be used.
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>>
>> Hi,
>> >
>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >
>> >Ahmet
>> >
>> >
>> >
>> >
>> >
>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> candygram.for.mongo@gmail.com> wrote:
>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>> >
>> >*SOLR/Lucene version: *4.2.1*
>> >
>> >
>> >*JVM version:
>> >
>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >
>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >
>> >
>> >
>> >*Indexer startup command:
>> >
>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >
>> >
>> >
>> >java " %JVMARGS% ^
>> >
>> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >
>> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >
>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >
>> >-jar start.jar
>> >
>> >
>> >
>> >*SOLR indexing HTTP parameters request:
>> >
>> >webapp=/solr path=/dataimport
>> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >
>> >
>> >
>> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >million records.  If we increase the Java heap memory settings the
>> problem
>> >goes away but we believe the problem has not been fixed and that we will
>> >eventually get the same OOM exception.  We have other processes on the
>> >server that also require resources so we cannot continually increase the
>> >memory settings to resolve the OOM issue.  We are trying to find a way to
>> >configure the SOLR instance to reduce or preferably eliminate the
>> >possibility of an OOM exception.
>> >
>> >
>> >
>> >We can reproduce the problem on a test machine.  We set the Java heap
>> >memory size to 64MB to accelerate the exception.  If we increase this
>> >setting the same problems occurs, just hours later.  In the test
>> >environment, we are using the following parameters:
>> >
>> >
>> >
>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >
>> >
>> >
>> >Normally we use the default solrconfig.xml file with only the following
>> jar
>> >file references added:
>> >
>> >
>> >
>> ><lib path="../../../../default/lib/common.jar" />
>> >
>> ><lib path="../../../../default/lib/webapp.jar" />
>> >
>> ><lib path="../../../../default/lib/commons-pool-1.4.jar" />
>> >
>> >
>> >
>> >Using these values and trying to index 6 million records from the
>> database,
>> >the Java Heap Out of Memory exception is thrown very quickly.
>> >
>> >
>> >
>> >We were able to complete a successful indexing by further modifying the
>> >solrconfig.xml and removing all or all but one <copyfield> tags from the
>> >schema.xml file.
>> >
>> >
>> >
>> >The following solrconfig.xml values were modified:
>> >
>> >
>> >
>> ><ramBufferSizeMB>6</ramBufferSizeMB>
>> >
>> >
>> >
>> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>> >
>> ><int name="maxMergeAtOnce">2</int>
>> >
>> ><int name="maxMergeAtOnceExplicit">2</int>
>> >
>> ><int name="segmentsPerTier">10</int>
>> >
>> ><int name="maxMergedSegmentMB">150</int>
>> >
>> ></mergePolicy>
>> >
>> >
>> >
>> ><autoCommit>
>> >
>> ><maxDocs>15000</maxDocs>  <!--     This tag was maxTime, before
this -- >
>> >
>> ><openSearcher>false</openSearcher>
>> >
>> ></autoCommit>
>> >
>> >
>> >
>> >Using our customized schema.xml file with two or more <copyfield> tags,
>> the
>> >OOM exception is always thrown.  Based on the errors, the problem occurs
>> >when the process was trying to do the merge.  The error is provided
>> below:
>> >
>> >
>> >
>> >Exception in thread "Lucene Merge Thread #156"
>> >org.apache.lucene.index.MergePolicy$MergeException:
>> >java.lang.OutOfMemoryError: Java heap space
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>> >
>> >Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >                at
>>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>> >
>> >                at
>>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>> >
>> >                at
>>
>> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>> >
>> >                at
>>
>> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>> >
>> >                at
>> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>> >
>> >                at
>> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>> >
>> >                at
>> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>> >
>> >                at
>> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>> >
>> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>> >
>> >SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
>> >hit an OutOfMemoryError; cannot commit
>> >
>> >                at
>> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>> >
>> >                at
>>
>> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>> >
>> >                at
>> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>> >
>> >                at
>> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>> >
>> >                at
>>
>> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
>> >
>> >                at
>> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> >
>> >                at
>> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >
>> >                at
>> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >
>> >                at
>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >
>> >                at
>>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> >
>> >                at
>>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> >
>> >                at
>>
>> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >
>> >                at
>>
>> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >
>> >                at java.lang.Thread.run(Thread.java:722)
>> >
>> >
>> >
>> >We think but are not 100% sure that the problem is related to the merge.
>> >
>> >
>> >
>> >Normally our schema.xml contains a lot of field specifications (like the
>> >ones seen in the file fragment below):
>> >
>> >
>> >
>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> dest="ADDRESS.RECORD_ID.case_abc"
>> >/>
>> >
>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >dest="ADDRESS.RECORD_ID.case.soundex_abc" />
>> >
>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >dest="ADDRESS.RECORD_ID.case_nvl_abc" />
>> >
>> >
>> >
>> >In tests using the default file schema.xml and no <copyfield> tags,
>> >indexing completed successfully.  6 million records produced a 900 MB
>> data
>> >directory.
>> >
>> >
>> >
>> >When I included just one <copyfield> tag, indexing completed
>> successfully.  6
>> >million records produced a 990 MB data directory (90 MB bigger).
>> >
>> >
>> >
>> >When I included just two <copyfield> tags, the index crashed with an OOM
>> >exception.
>> >
>> >
>> >
>> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed
>> the
>> >crash.
>> >
>> >
>> >
>> >The net of our test results I as follows:
>> >
>> >
>> >
>> >*solrconfig.xml*
>> >
>> >*schema.xml*
>> >
>> >*result*
>> >
>> >
>> >default plus only jar references
>> >
>> >default (no copyfield tags)
>> >
>> >success
>> >
>> >default plus only jar references
>> >
>> >modified with one copyfield tag
>> >
>> >success
>> >
>> >default plus only jar references
>> >
>> >modified with two copyfield tags
>> >
>> >crash
>> >
>> >additional modified settings
>> >
>> >default (no copyfield tags)
>> >
>> >success
>> >
>> >additional modified settings
>> >
>> >modified with one copyfield tag
>> >
>> >success
>> >
>> >additional modified settings
>> >
>> >modified with two copyfield tags
>> >
>> >crash
>> >
>> >
>> >
>> >
>> >
>> >Our question is, what can we do to eliminate these OOM exceptions?
>> >
>> >
>>
>
>


Mime
View raw message