lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sunnyfr <johanna...@gmail.com>
Subject Re: DataImportHandler running out of memory
Date Fri, 31 Oct 2008 10:20:41 GMT

Hi Grant,

How did you finally managed it ???? 
I've the same problem with less data, 8,5M, if I put a batchsize -1, I will
slow down a lot the database which is not that good for the website and
stack request.
What did you do you ??? 

Thanks,


Grant Ingersoll-6 wrote:
> 
> I think it's a bit different.  I ran into this exact problem about two  
> weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
> size for it's v5 JDBC driver.
> 
> See
> http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
>   or do a search for MySQL fetch size.
> 
> You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
> work) in order to get streaming in MySQL.
> 
> -Grant
> 
> 
> On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:
> 
>> Setting the batchSize to 10000 would mean that the Jdbc driver will  
>> keep
>> 10000 rows in memory *for each entity* which uses that data source (if
>> correctly implemented by the driver). Not sure how well the Sql Server
>> driver implements this. Also keep in mind that Solr also needs  
>> memory to
>> index documents. You can probably try setting the batch size to a  
>> lower
>> value.
>>
>> The regular memory tuning stuff should apply here too -- try disabling
>> autoCommit and turn-off autowarming and see if it helps.
>>
>> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <wojtek_p@hotmail.com>  
>> wrote:
>>
>>>
>>> I'm trying to load ~10 million records into Solr using the
>>> DataImportHandler.
>>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
>>> space) as
>>> soon as I try loading more than about 5 million records.
>>>
>>> Here's my configuration:
>>> I'm connecting to a SQL Server database using the sqljdbc driver.  
>>> I've
>>> given
>>> my Solr instance 1.5 GB of memory. I have set the dataSource  
>>> batchSize to
>>> 10000. My SQL query is "select top XXX field1, ... from table1". I  
>>> have
>>> about 40 fields in my Solr schema.
>>>
>>> I thought the DataImportHandler would stream data from the DB  
>>> rather than
>>> loading it all into memory at once. Is that not the case? Any  
>>> thoughts on
>>> how to get around this (aside from getting a machine with more  
>>> memory)?
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20263146.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message