lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshi, Shital" <Shital.Jo...@gs.com>
Subject RE: Solr4 update and query performance question
Date Wed, 14 Aug 2013 20:39:01 GMT
We didn't copy/paste Solr3 config to solr4. We started with Solr4 config and only updated new
searcher queries and few other things.

There is no batching while updating/inserting documents in Solr3, is that correct? Committing
1000 documents in Solr3 takes 19 seconds while in Solr4 it takes about 3-4 minutes. We noticed
in Solr4 logs that, commit only returns after new searcher is created across all nodes. This
is possibly cause waitSearcher=true by default in Solr4. This was not the case with Solr3,
commit would return without waiting for new searcher creation. 

In order to improve performance with Solr4, we first changed from commit=true to commit=false
in update URL and added autoHardCommit setting in solrconfig.xml. This improved performance
from 3-4 minutes to 1-2 minutes but that is not good enough. 

Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class from 10 to 1000
and deployed this class in $JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and
restarted solr4 nodes. But we still see the batch size of 10 being used. Did we change correct
variable/class? 

Next thing We will try using softCommit=true in update url and check if it gives us desired
performance. 

Thanks for looking into this. Appreciate your help. 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, August 13, 2013 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 update and query performance question

1> That's hard-coded at present. There's anecdotal evidence that there
     are throughput improvements with larger batch sizes, but no action
     yet.
2> Yep, all searchers are also re-opened, caches re-warmed, etc.
3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
    queries would help diagnose this. Also, did you try to copy/paste
    the configuration from your Solr3 to Solr4? I'd start with the
    Solr4 and copy/paste only the parts needed from your SOlr3 setup.

Best
Erick


On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital <Shital.Joshi@gs.com> wrote:

> Hi,
>
> We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
> with about 450 mil documents (~90 mil per shard). We're loading 1000 or
> less documents in CSV format every few minutes. In Solr3, with 300 mil
> documents, it used to take 30 seconds to load 1000 documents while in
> Solr4, its taking up to 3 minutes to load 1000 documents. We're using
> custom sharding, we include _shard_=shardid parameter in update command.
> Upon looking Solr4 log files we found that:
>
> 1.       Documents are added in a batch of 10 records. How do we increase
> this batch size from 10 to 1000 documents?
>
> 2.      We do hard commit after loading 1000 documents. For every hard
> commit, it refreshes searcher on all nodes. Are all caches also refreshed
> when hard commit happens? We're planning to change to soft commit and do
> auto hard commit every 10-15 minutes.
>
> 3.      We're not seeing improved query performance compared to Solr3.
> Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
> seconds with Solr4. We think this could be due to frequent hard commits and
> searcher refresh. Do you think when we change to soft commit and increase
> the batch size, we will see better query performance.
>
> Thanks!
>
>
>

Mime
View raw message