lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit
Date Mon, 27 Aug 2012 13:33:39 GMT
The autocommits are about what I'd expect. 17 hours
== 102 ten minute blocks, which is roughly your
115 autocommits. I'm _guessing_ that the total
commits are a combination of soft and hard. You'll
have 20,400 soft commits in that time frame, so this
works as a rough estimate....

And SolrJ doesn't do a commit after an add unless
you tell it to.

As for search performance, it's quite hard to tell, But
you have about 133M documents/shard, and two
replicas. You have a relatively small amount of
memory allocated for indexes that size. It's time to
just dig into what you can expect out of your boxes.

Here's a blog that outlines a way to understand more
about the capacity of your hardware that might help.
I'd take the SolrCloud bits out for right now, and just
concentrate on the capacity of the machine in your
situation, then add SolrCloud back in to the mix.
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

It'd be interesting to see what your query rate
was if you stop the indexing process. Mostly I'm
just looking for which factors change performance,
not recommending that you go with that approach.

The good news is that you can get virtually whatever
QPS rate you need by simply racking in more replicas
for each shard....

Best
Erick




On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S <srikanth85@gmail.com> wrote:
> Hi,
>
> I am doing a small test for my company to see if SolrCloud is suitable for
> our indexing needs. The setup is as follows:
>
>    - Solr version 4.0 BETA1
>    - Three physical machines hosting solr servers
>    - Distributed ZooKeeper setup on the same three machines
>    - 2 solr cores on each server: total 6 cores
>    - 3 shards (and hence 1 replica each)
>    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
>    replica2)
>    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
>    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
>    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
>    - Two separate machines M4, M5 run separate 'create' client and 'search'
>    client respectively
>
> Config:
>
>    - schema.xml: copied from bundled 'example/solr/collection1', removed
>    all 'field' and 'copyFields' entries it came with, and added ~15 fields of
>    my own (mostly strings and a few integers, all indexed, all stored, four of
>    them multivalued)
>    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
>    autocommit duration to 10mins with openSearcher=false and set
>    autoSoftCommit to 3secs.
>
> The documents being committed are fairly small in size, with around 10/15
> attributes, most of them strings and fairly small strings (like person
> names, street names etc).
>
> I've been indexing data (with no searches in between) using a 50 threaded
> 'create' client for the last 17 hours at the end of which I have
> ~400million such documents indexed. For the most part of this time (from
> the logs), I was able to index at around 6000-7000 documents per second (to
> give you some idea of the machine specs/network etc.) and with each
> solrServer.add() request returning in sub 10ms response times. And yes, I
> am using solrj with CloudSolrServer.
>
> Questions:
> 1. When I connect to the admin console of one of the servers, under the
> core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> commits:23028
> autocommit maxTime:600000ms
> autocommits:115
> soft autocommit maxTime:3000ms
> soft autocommits:22912
> optimizes:0
>
> Two things interest me here:
> a. there are very few auto-commits while there have been a number of
> commits. However, I am not calling any explicit commit anywhere in the
> client codes. *Am I missing something here?* Does the Solrj client
> automatically commit after each add()? This is what is bothering me the
> most, especially in light of less than expected search performance (as
> outlined in question 2).
> b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> mark against 'Optimized'. Which one is right? Further, how much does
> 'optimize' affect the search performance (in the light of the next question
> I am going to ask)
>
> 2. After reaching the 400million mark, I've set the 'create' client to
> index documents at around the rate of ~500 documents/second (using the same
> 50 threads), and going by the log, that seems to be happening. Now, at the
> same time, I've started the 'search' client, which searches for random
> documents using 50 threads. Most of these searches return 1 document each,
> and rarely 4/5 documents, but not more than that. But I notice that the
> search is much slower than what I expected: only around 40 searches go
> through per second and each search takes around 1000-1400ms most of the
> time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in the
> search query. The question is, am I messing up something (w.r.t. question 1
> above), or does it really take this much time to search on an index of this
> size?
>
>
>    -
>
> Please do let me know if I need to share any more details. Thanks in
> advance.
>
> Thanks
> Srikanth S

Mime
View raw message