lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yoni Amir <Yoni.A...@niceactimize.com>
Subject RE: out of memory during indexing do to large incoming queue
Date Sun, 02 Jun 2013 18:25:11 GMT
Hi Shawn and Shreejay, thanks for the response.
Here is some more information:
1) The machine is a virtual machine on ESX server. It has 4 CPUs and 8GB of RAM. I don't remember
what CPU but something modern enough. It is running Java 7 without any special parameters,
and 4GB allocated for Java (-Xmx)
2) After successful indexing, I have 2.5 Million documents, 117GB index size. This is the
size after it was optimized.
3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was available at the time
that we had a release deadline.
4) The setup with master-slave replication, not Solr Cloud. The server that I am discussing
is the indexing server, and in these tests there were actually no slaves involved, and virtually
zero searches performed.
5) Attached is my configuration. I tried to disable the warm-up and opening of searchers,
it didn't change anything. The commits are done by Solr, using autocommit. The client sends
the updates without a commit command.
6) I want to disable optimization, but when I disabled it, the OOME occurred even faster.
The number of segments reached around a thousand within an hour or so. I don't know if it's
normal or not, but at that point if I restarted Solr it immediately took about 1GB of heap
space just on start-up, instead of the usual 50MB or so.

If I commit less frequently, don't I increase the risk of losing data, e.g., if the power
goes down, etc.?
If I disable optimization, is it necessary to avoid such a large number of segments? Is it
possible?

Thanks again,
Yoni



-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: Sunday, June 02, 2013 18:05
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 8:16 AM, Yoni Amir wrote:
> Hello,
> I am receiving OutOfMemoryError during indexing, and after investigating the heap dump,
I am still missing some information, and I thought this might be a good place for help.
> 
> I am using Solr 4.0 beta, and I have 5 threads that send update requests to Solr. Each
request is a bulk of 100 SolrInputDocuments (using solrj), and my goal is to index around
2.5 million documents.
> Solr is configured to do a hard-commit every 10 seconds, so initially I thought that
it can only accumulate in memory 10 seconds worth of updates, but that's not the case. I can
see in a profiler how it accumulates memory over time, even with 4 to 6 GB of memory. It is
also configured to optimize with mergeFactor=10.

4.0-BETA came out several months ago.  Even at the time, support for the alpha and beta releases
was limited.  Now it has been superseded by 4.0.0, 4.1.0, 4.2.0, 4.2.1, and 4.3.0, all of
which are full releases.
There is a 4.3.1 release currently in the works.  Please upgrade.

Ten seconds is a very short interval for hard commits, even if you have openSearcher=false.
 Frequent hard commits can cause a whole host of problems.  It's better to have an interval
of several minutes, and I wouldn't go less than a minute.  Soft commits can be much more frequent,
but if you are frequently opening new searchers, you'll probably want to disable cache warming.

On optimization: don't do it unless you absolutely must.  Most of the time, optimization is
only needed if you delete a lot of documents and you need to get them removed from your index.
 If you must optimize to get rid of deleted documents, do it on a very long interval (once
a day, once a week) and pause indexing during optimization.

You haven't said anything about your index size, java heap size, total RAM, etc.  With those
numbers I could offer some guesses about what you need, but I'll warn you that they would
only be guesses - watching a system with real data under load is the only way to get concrete
information.  Here are some basic guidelines on performance problems and RAM information:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the above-named persons
only and may be confidential and/or legally privileged. Any opinions expressed in this communication
are not necessarily those of NICE Actimize. If this communication has come to you in error
you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy
and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are
free from any virus, we advise that in keeping with good computing practice the recipient
should ensure they are actually virus free.

Mime
View raw message