lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Candygram For Mongo <candygram.for.mo...@gmail.com>
Subject Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Date Fri, 04 Apr 2014 23:59:07 GMT
Guessing that the attachments won't work, I am pasting one file in each of
four separate emails.

database.xml


<dataConfig>
<dataSource
name="org_only"
type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:@test.abcdata.com:1521:ORCL"
user="admin"
password="admin"
readOnly="false"
/>
<document>


<entity name="full-index" query="
select

NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORACLE.ADDRESS_ALL'
as SOLR_CATEGORY,

NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORACLE.ADDRESS_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORACLE.ADDRESS_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORACLE.ADDRESS_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORACLE.ADDRESS_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORACLE.ADDRESS_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY,
NVL(cast(ORACLE.ADDRESS_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORACLE.ADDRESS_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORACLE.ADDRESS_ALL
" >

<field column="SOLR_ID" name="id" />
<field column="SOLR_CATEGORY" name="category" />
<field column="ADDRESSALLROWID" name="ADDRESS_ALL.RECORD_ID_abc" />
<field column="ADDRESSALLADDRTYPECD" name="ADDRESS_ALL.ADDR_TYPE_CD_abc" />
<field column="ADDRESSALLLONGITUDE" name="ADDRESS_ALL.LONGITUDE_abc" />
<field column="ADDRESSALLLATITUDE" name="ADDRESS_ALL.LATITUDE_abc" />
<field column="ADDRESSALLADDRNAME" name="ADDRESS_ALL.ADDR_NAME_abc" />
<field column="ADDRESSALLCITY" name="ADDRESS_ALL.CITY_abc" />
<field column="ADDRESSALLSTATE" name="ADDRESS_ALL.STATE_abc" />
<field column="ADDRESSALLEMAILADDR" name="ADDRESS_ALL.EMAIL_ADDR_abc" />

</entity>



<!-- Varaibles -->
<!-- '${dataimporter.last_index_time}' -->
</document>
</dataConfig>



On Fri, Apr 4, 2014 at 4:57 PM, Candygram For Mongo <
candygram.for.mongo@gmail.com> wrote:

> Does this user list allow attachments?  I have four files attached
> (database.xml, error.txt, schema.xml, solrconfig.xml).  We just ran the
> process again using the parameters you suggested, but not to a csv file.
>  It errored out quickly.  We are working on the csv file run.
>
> Removed both <autoCommit> and <autoSoftCommit> parts/definitions from
> solrconfig.xml
>
> Disabled tlog by removing
>
>    <updateLog>
>       <str name="dir">${solr.ulog.dir:}</str>
>     </updateLog>
>
> from solrconfig.xml
>
> Used commit=true parameter. ?commit=true&command=full-import
>
>
> On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
>> Hi,
>>
>> This may not solve your problem but generally it is recommended to
>> disable auto commit and transaction logs for bulk indexing.
>> And issue one commit at the very end. Do you tlogs enabled? I see "commit
>> failed" in the error message thats why I am offering this.
>>
>> And regarding comma separated values, with this approach you focus on
>> just solr importing process. You separate data acquisition phrase. And it
>> is very fast load even big csv files
>> http://wiki.apache.org/solr/UpdateCSV
>> I have never experienced OOM during indexing, I suspect data acquisition
>> has role in it.
>>
>> Ahmet
>>
>> On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo <
>> candygram.for.mongo@gmail.com> wrote:
>>
>> We would be happy to try that.  That sounds counter intuitive for the
>> high volume of records we have.  Can you help me understand how that might
>> solve our problem?
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>>
>> Hi,
>> >
>> >Can you remove auto commit for bulk import. Commit at the very end?
>> >
>> >Ahmet
>> >
>> >
>> >
>> >
>> >On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
>> candygram.for.mongo@gmail.com> wrote:
>> >In case the attached database.xml file didn't show up, I have pasted in
>> the
>> >contents below:
>> >
>> ><dataConfig>
>> ><dataSource
>> >name="org_only"
>> >type="JdbcDataSource"
>> >driver="oracle.jdbc.OracleDriver"
>> >url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>> >user="admin"
>> >password="admin"
>> >readOnly="false"
>> >batchSize="100"
>> >/>
>> ><document>
>> >
>> >
>> ><entity name="full-index" query="
>> >select
>> >
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
>> >as SOLR_ID,
>> >
>> >'ORCL.ADDRESS_ACCT_ALL'
>> >as SOLR_CATEGORY,
>> >
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
>> >ADDRESSALLROWID,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
>> >ADDRESSALLADDRTYPECD,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
>> >ADDRESSALLLONGITUDE,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
>> >ADDRESSALLLATITUDE,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
>> >ADDRESSALLADDRNAME,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
>> >ADDRESSALLCITY,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
>> >ADDRESSALLSTATE,
>> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
>> >ADDRESSALLEMAILADDR
>> >
>> >from ORCL.ADDRESS_ACCT_ALL
>> >" >
>> >
>> ><field column="SOLR_ID" name="id" />
>> ><field column="SOLR_CATEGORY" name="category" />
>> ><field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" />
>> ><field column="ADDRESSALLADDRTYPECD"
>> >name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>> ><field column="ADDRESSALLLONGITUDE"
>> name="ADDRESS_ACCT_ALL.LONGITUDE_abc" />
>> ><field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc"
>> />
>> ><field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc"
>> />
>> ><field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" />
>> ><field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" />
>> ><field column="ADDRESSALLEMAILADDR"
>> name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc"
>> >/>
>> >
>> ></entity>
>> >
>> >
>> >
>> ><!-- Varaibles -->
>> ><!-- '${dataimporter.last_index_time}' -->
>> ></document>
>> ></dataConfig>
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>> >candygram.for.mongo@gmail.com> wrote:
>> >
>> >> In this case we are indexing an Oracle database.
>> >>
>> >> We do not include the data-config.xml in our distribution.  We store
>> the
>> >> database information in the database.xml file.  I have attached the
>> >> database.xml file.
>> >>
>> >> When we use the default merge policy settings, we get the same results.
>> >>
>> >>
>> >>
>> >> We have not tried to dump the table to a comma separated file.  We
>> think
>> >> that dumping this size table to disk will introduce other memory
>> problems
>> >> with big file management. We have not tested that case.
>> >>
>> >>
>> >> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iorixxx@yahoo.com>
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Which database are you using? Can you send us data-config.xml?
>> >>>
>> >>> What happens when you use default merge policy settings?
>> >>>
>> >>> What happens when you dump your table to Comma Separated File and fed
>> >>> that file to solr?
>> >>>
>> >>> Ahmet
>> >>>
>> >>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> >>> candygram.for.mongo@gmail.com> wrote:
>> >>>
>> >>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> >>> system crash sooner.  In production that tag is commented out which
>> >>> I believe forces the default value to be used.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iorixxx@yahoo.com>
>> wrote:
>> >>>
>> >>> Hi,
>> >>> >
>> >>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >>> >
>> >>> >Ahmet
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> >>> candygram.for.mongo@gmail.com> wrote:
>> >>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
>> Exception
>> >>> >
>> >>> >*SOLR/Lucene version: *4.2.1*
>> >>> >
>> >>> >
>> >>> >*JVM version:
>> >>> >
>> >>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >>> >
>> >>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >>> >
>> >>> >
>> >>> >
>> >>> >*Indexer startup command:
>> >>> >
>> >>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >>> >
>> >>> >
>> >>> >
>> >>> >java " %JVMARGS% ^
>> >>> >
>> >>> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >>> >
>> >>> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >>> >
>> >>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >>> >
>> >>> >-jar start.jar
>> >>> >
>> >>> >
>> >>> >
>> >>> >*SOLR indexing HTTP parameters request:
>> >>> >
>> >>> >webapp=/solr path=/dataimport
>> >>> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >>> >
>> >>> >
>> >>> >
>> >>> >We are getting a Java heap OOM exception when indexing (updating)
27
>> >>> >million records.  If we increase the Java heap memory settings the
>> >>> problem
>> >>> >goes away but we believe the problem has not been fixed and that
we
>> will
>> >>> >eventually get the same OOM exception.  We have other processes
on
>> the
>> >>> >server that also require resources so we cannot continually increase
>> the
>> >>> >memory settings to resolve the OOM issue.  We are trying to find
a
>> way to
>> >>> >configure the SOLR instance to reduce or preferably eliminate the
>> >>> >possibility of an OOM exception.
>> >>> >
>> >>> >
>> >>> >
>> >>> >We can reproduce the problem on a test machine.  We set the Java
heap
>> >>> >memory size to 64MB to accelerate the exception.  If we increase
this
>> >>> >setting the same problems occurs, just hours later.  In the test
>> >>> >environment, we are using the following parameters:
>> >>> >
>> >>> >
>> >>> >
>> >>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >>> >
>> >>> >
>> >>> >
>> >>> >Normally we use the default solrconfig.xml file with only the
>> following
>> >>> jar
>> >>> >file references added:
>> >>> >
>> >>> >
>> >>> >
>> >>> ><lib path="../../../../default/lib/common.jar" />
>> >>> >
>> >>> ><lib path="../../../../default/lib/webapp.jar" />
>> >>> >
>> >>> ><lib path="../../../../default/lib/commons-pool-1.4.jar" />
>> >>> >
>> >>> >
>> >>> >
>> >>> >Using these values and trying to index 6 million records from the
>> >>> database,
>> >>> >the Java Heap Out of Memory exception is thrown very quickly.
>> >>> >
>> >>> >
>> >>> >
>> >>> >We were able to complete a successful indexing by further modifying
>> the
>> >>> >solrconfig.xml and removing all or all but one <copyfield>
tags from
>> the
>> >>> >schema.xml file.
>> >>> >
>> >>> >
>> >>> >
>> >>> >The following solrconfig.xml values were modified:
>> >>> >
>> >>> >
>> >>> >
>> >>> ><ramBufferSizeMB>6</ramBufferSizeMB>
>> >>> >
>> >>> >
>> >>> >
>> >>> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>> >>> >
>> >>> ><int name="maxMergeAtOnce">2</int>
>> >>> >
>> >>> ><int name="maxMergeAtOnceExplicit">2</int>
>> >>> >
>> >>> ><int name="segmentsPerTier">10</int>
>> >>> >
>> >>> ><int name="maxMergedSegmentMB">150</int>
>> >>> >
>> >>> ></mergePolicy>
>> >>> >
>> >>> >
>> >>> >
>> >>> ><autoCommit>
>> >>> >
>> >>> ><maxDocs>15000</maxDocs>  <!--     This tag was maxTime,
before this
>> -- >
>> >>> >
>> >>> ><openSearcher>false</openSearcher>
>> >>> >
>> >>> ></autoCommit>
>> >>> >
>> >>> >
>> >>> >
>> >>> >Using our customized schema.xml file with two or more <copyfield>
>> tags,
>> >>> the
>> >>> >OOM exception is always thrown.  Based on the errors, the problem
>> occurs
>> >>> >when the process was trying to do the merge.  The error is provided
>> >>> below:
>> >>> >
>> >>> >
>> >>> >
>> >>> >Exception in thread "Lucene Merge Thread #156"
>> >>> >org.apache.lucene.index.MergePolicy$MergeException:
>> >>> >java.lang.OutOfMemoryError: Java heap space
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>> >>> >
>> >>> >Caused by: java.lang.OutOfMemoryError: Java heap space
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>> >>> >
>> >>> >                at
>> >>>
>> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>> >>> >
>> >>> >                at
>> >>> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>> >>> >
>> >>> >                at
>> >>>
>> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>> >>> >
>> >>> >                at
>> >>> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>> >>> >
>> >>> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>> >>> >
>> >>> >SEVERE: auto commit error...:java.lang.IllegalStateException: this
>> writer
>> >>> >hit an OutOfMemoryError; cannot commit
>> >>> >
>> >>> >                at
>> >>>
>> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>> >>> >
>> >>> >                at
>> >>>
>> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>> >>> >
>> >>> >                at
>> >>> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
>> >>> >
>> >>> >                at
>> >>> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> >>> >
>> >>> >                at
>> >>>
>> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>> >
>> >>> >                at
>> >>> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >>> >
>> >>> >                at
>> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>> >
>> >>> >                at
>> >>>
>> >>>
>> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>> >
>> >>> >                at java.lang.Thread.run(Thread.java:722)
>> >>> >
>> >>> >
>> >>> >
>> >>> >We think but are not 100% sure that the problem is related to the
>> merge.
>> >>> >
>> >>> >
>> >>> >
>> >>> >Normally our schema.xml contains a lot of field specifications (like
>> the
>> >>> >ones seen in the file fragment below):
>> >>> >
>> >>> >
>> >>> >
>> >>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >>> dest="ADDRESS.RECORD_ID.case_abc"
>> >>> >/>
>> >>> >
>> >>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >>> >dest="ADDRESS.RECORD_ID.case.soundex_abc" />
>> >>> >
>> >>> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >>> >dest="ADDRESS.RECORD_ID.case_nvl_abc" />
>> >>> >
>> >>> >
>> >>> >
>> >>> >In tests using the default file schema.xml and no <copyfield>
tags,
>> >>> >indexing completed successfully.  6 million records produced a 900
MB
>> >>> data
>> >>> >directory.
>> >>> >
>> >>> >
>> >>> >
>> >>> >When I included just one <copyfield> tag, indexing completed
>> >>> successfully.  6
>> >>> >million records produced a 990 MB data directory (90 MB bigger).
>> >>> >
>> >>> >
>> >>> >
>> >>> >When I included just two <copyfield> tags, the index crashed
with an
>> OOM
>> >>> >exception.
>> >>> >
>> >>> >
>> >>> >
>> >>> >Changing parameters like maxMergedSegmentMB or maxDocs, only
>> postponed
>> >>> the
>> >>> >crash.
>> >>> >
>> >>> >
>> >>> >
>> >>> >The net of our test results I as follows:
>> >>> >
>> >>> >
>> >>> >
>> >>> >*solrconfig.xml*
>> >>> >
>> >>> >*schema.xml*
>> >>> >
>> >>> >*result*
>> >>> >
>> >>> >
>> >>> >default plus only jar references
>> >>> >
>> >>> >default (no copyfield tags)
>> >>> >
>> >>> >success
>> >>> >
>> >>> >default plus only jar references
>> >>> >
>> >>> >modified with one copyfield tag
>> >>> >
>> >>> >success
>> >>> >
>> >>> >default plus only jar references
>> >>> >
>> >>> >modified with two copyfield tags
>> >>> >
>> >>> >crash
>> >>> >
>> >>> >additional modified settings
>> >>> >
>> >>> >default (no copyfield tags)
>> >>> >
>> >>> >success
>> >>> >
>> >>> >additional modified settings
>> >>> >
>> >>> >modified with one copyfield tag
>> >>> >
>> >>> >success
>> >>> >
>> >>> >additional modified settings
>> >>> >
>> >>> >modified with two copyfield tags
>> >>> >
>> >>> >crash
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >Our question is, what can we do to eliminate these OOM exceptions?
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message