lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wojtekpia <wojte...@hotmail.com>
Subject Re: DataImportHandler running out of memory
Date Wed, 25 Jun 2008 16:47:24 GMT

I'm trying with batchSize=-1 now. So far it seems to be working, but very
slowly. I will update when it completes or crashes.

Even with a batchSize of 100 I was running out of memory.

I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
believe that's the maximum for my environment.

The batchSize parameter doesn't seem to control what happens... when I
select top 5,000,000 with a batchSize of 10,000, it works. When I select top
10,000,000 with the same batchSize, it runs out of memory.

Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> DIH streams rows one by one.
> set the fetchSize="-1" this might help. It may make the indexing a bit
> slower but memory consumption would be low.
> The memory is consumed by the jdbc driver. try tuning the -Xmx value for
> the VM
> --Noble
> 
> On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
> <shalinmangar@gmail.com> wrote:
>> Setting the batchSize to 10000 would mean that the Jdbc driver will keep
>> 10000 rows in memory *for each entity* which uses that data source (if
>> correctly implemented by the driver). Not sure how well the Sql Server
>> driver implements this. Also keep in mind that Solr also needs memory to
>> index documents. You can probably try setting the batch size to a lower
>> value.
>>
>> The regular memory tuning stuff should apply here too -- try disabling
>> autoCommit and turn-off autowarming and see if it helps.
>>
>> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <wojtek_p@hotmail.com> wrote:
>>
>>>
>>> I'm trying to load ~10 million records into Solr using the
>>> DataImportHandler.
>>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
>>> as
>>> soon as I try loading more than about 5 million records.
>>>
>>> Here's my configuration:
>>> I'm connecting to a SQL Server database using the sqljdbc driver. I've
>>> given
>>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
>>> to
>>> 10000. My SQL query is "select top XXX field1, ... from table1". I have
>>> about 40 fields in my Solr schema.
>>>
>>> I thought the DataImportHandler would stream data from the DB rather
>>> than
>>> loading it all into memory at once. Is that not the case? Any thoughts
>>> on
>>> how to get around this (aside from getting a machine with more memory)?
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message