lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: DataImportHandler running out of memory
Date Wed, 25 Jun 2008 11:05:19 GMT
I think it's a bit different.  I ran into this exact problem about two  
weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
size for it's v5 JDBC driver.

See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
  or do a search for MySQL fetch size.

You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
work) in order to get streaming in MySQL.

-Grant


On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

> Setting the batchSize to 10000 would mean that the Jdbc driver will  
> keep
> 10000 rows in memory *for each entity* which uses that data source (if
> correctly implemented by the driver). Not sure how well the Sql Server
> driver implements this. Also keep in mind that Solr also needs  
> memory to
> index documents. You can probably try setting the batch size to a  
> lower
> value.
>
> The regular memory tuning stuff should apply here too -- try disabling
> autoCommit and turn-off autowarming and see if it helps.
>
> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <wojtek_p@hotmail.com>  
> wrote:
>
>>
>> I'm trying to load ~10 million records into Solr using the
>> DataImportHandler.
>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
>> space) as
>> soon as I try loading more than about 5 million records.
>>
>> Here's my configuration:
>> I'm connecting to a SQL Server database using the sqljdbc driver.  
>> I've
>> given
>> my Solr instance 1.5 GB of memory. I have set the dataSource  
>> batchSize to
>> 10000. My SQL query is "select top XXX field1, ... from table1". I  
>> have
>> about 40 fields in my Solr schema.
>>
>> I thought the DataImportHandler would stream data from the DB  
>> rather than
>> loading it all into memory at once. Is that not the case? Any  
>> thoughts on
>> how to get around this (aside from getting a machine with more  
>> memory)?
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Mime
View raw message