lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth S <srikant...@gmail.com>
Subject Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit
Date Mon, 27 Aug 2012 15:31:17 GMT
Thanks for your response Erick.

Your explanation seems to make sense for the commit count. But I guess the
UI needs to be fixed.

Regarding the performance, I went through your blog (nicely written btw
(and good links to other interesting blogs too)). I didn't realize that
everything that is indexed needs to be kept in memory for reasonable
performance, and in that case 133M documents (each with several indexed
fields) per shard, and for a server hosting 2 such shards, the memory we
have provided does seem to be very less. I think we need to do an
evaluation of our hardware as you pointed out. I didn't get one thing in
your blog though: the paragraph that starts with: "Now, take say 80% of the
QPS rate above...". I am assuming you meant "Keep adding 1M documents and
see the point where the QPS drops to 80% of the above value". Correct me if
I am wrong.

Wrt the query rate, we were able to run at around 80-90 searches/sec with
indexing off, and 50-60 searches/sec while indexing at an average rate of
500 inserts/sec.

Regarding stacking up of replicas to get more QPS, I would have expected
the same, but with very little documentation (and with some of them
conflicting) on SolrCloud design, I was not very sure about that. So, if
you can, and if you have access to, can you point me to some places where
more details about the architecture of SolrCloud is explained? I'd
appreciate that greatly.

Thanks again.

On Mon, Aug 27, 2012 at 6:33 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> The autocommits are about what I'd expect. 17 hours
> == 102 ten minute blocks, which is roughly your
> 115 autocommits. I'm _guessing_ that the total
> commits are a combination of soft and hard. You'll
> have 20,400 soft commits in that time frame, so this
> works as a rough estimate....
>
> And SolrJ doesn't do a commit after an add unless
> you tell it to.
>
> As for search performance, it's quite hard to tell, But
> you have about 133M documents/shard, and two
> replicas. You have a relatively small amount of
> memory allocated for indexes that size. It's time to
> just dig into what you can expect out of your boxes.
>
> Here's a blog that outlines a way to understand more
> about the capacity of your hardware that might help.
> I'd take the SolrCloud bits out for right now, and just
> concentrate on the capacity of the machine in your
> situation, then add SolrCloud back in to the mix.
>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> It'd be interesting to see what your query rate
> was if you stop the indexing process. Mostly I'm
> just looking for which factors change performance,
> not recommending that you go with that approach.
>
> The good news is that you can get virtually whatever
> QPS rate you need by simply racking in more replicas
> for each shard....
>
> Best
> Erick
>
>
>
>
> On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S <srikanth85@gmail.com> wrote:
> > Hi,
> >
> > I am doing a small test for my company to see if SolrCloud is suitable
> for
> > our indexing needs. The setup is as follows:
> >
> >    - Solr version 4.0 BETA1
> >    - Three physical machines hosting solr servers
> >    - Distributed ZooKeeper setup on the same three machines
> >    - 2 solr cores on each server: total 6 cores
> >    - 3 shards (and hence 1 replica each)
> >    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
> >    replica2)
> >    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
> >    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
> >    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
> >    - Two separate machines M4, M5 run separate 'create' client and
> 'search'
> >    client respectively
> >
> > Config:
> >
> >    - schema.xml: copied from bundled 'example/solr/collection1', removed
> >    all 'field' and 'copyFields' entries it came with, and added ~15
> fields of
> >    my own (mostly strings and a few integers, all indexed, all stored,
> four of
> >    them multivalued)
> >    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
> >    autocommit duration to 10mins with openSearcher=false and set
> >    autoSoftCommit to 3secs.
> >
> > The documents being committed are fairly small in size, with around 10/15
> > attributes, most of them strings and fairly small strings (like person
> > names, street names etc).
> >
> > I've been indexing data (with no searches in between) using a 50 threaded
> > 'create' client for the last 17 hours at the end of which I have
> > ~400million such documents indexed. For the most part of this time (from
> > the logs), I was able to index at around 6000-7000 documents per second
> (to
> > give you some idea of the machine specs/network etc.) and with each
> > solrServer.add() request returning in sub 10ms response times. And yes, I
> > am using solrj with CloudSolrServer.
> >
> > Questions:
> > 1. When I connect to the admin console of one of the servers, under the
> > core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> > commits:23028
> > autocommit maxTime:600000ms
> > autocommits:115
> > soft autocommit maxTime:3000ms
> > soft autocommits:22912
> > optimizes:0
> >
> > Two things interest me here:
> > a. there are very few auto-commits while there have been a number of
> > commits. However, I am not calling any explicit commit anywhere in the
> > client codes. *Am I missing something here?* Does the Solrj client
> > automatically commit after each add()? This is what is bothering me the
> > most, especially in light of less than expected search performance (as
> > outlined in question 2).
> > b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> > mark against 'Optimized'. Which one is right? Further, how much does
> > 'optimize' affect the search performance (in the light of the next
> question
> > I am going to ask)
> >
> > 2. After reaching the 400million mark, I've set the 'create' client to
> > index documents at around the rate of ~500 documents/second (using the
> same
> > 50 threads), and going by the log, that seems to be happening. Now, at
> the
> > same time, I've started the 'search' client, which searches for random
> > documents using 50 threads. Most of these searches return 1 document
> each,
> > and rarely 4/5 documents, but not more than that. But I notice that the
> > search is much slower than what I expected: only around 40 searches go
> > through per second and each search takes around 1000-1400ms most of the
> > time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in
> the
> > search query. The question is, am I messing up something (w.r.t.
> question 1
> > above), or does it really take this much time to search on an index of
> this
> > size?
> >
> >
> >    -
> >
> > Please do let me know if I need to share any more details. Thanks in
> > advance.
> >
> > Thanks
> > Srikanth S
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message