lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <>
Subject Re: Improving Solr performance
Date Mon, 10 Jan 2011 21:08:00 GMT
I see a lot of people using shards to hold "different types of 
documents", and it almost always seems to be a bad solution. Shards are 
intended for distributing a large index over multiple hosts -- that's 
it.  Not for some kind of federated search over multiple schemas, not 
for access control.

Why not put everything in the same index, without shards, and just use 
an 'fq' limit in order to limit to the specific document you'd like to 
search over in a given search?    I think that would achieve your goal a 
lot more simply than shards -- then you use sharding only if and when 
your index grows to be so large you'd like to distribute it over 
multiple hosts, and when you do so you choose a shard key that will have 
more or less equal distribution accross shards.

Using shards for access control or schema management just leads to 

[Apparently Solr could use some highlighted documentation on what shards 
are really for, as it seems to be a very common issue on this list, 
someone trying to use them for something else and then inevitably 
finding problems with that approach.]


On 1/7/2011 6:48 AM, supersoft wrote:
> The reason of this distribution is the kind of the documents. In spite of
> having the same schema structure (and solr conf), a document belongs to 1 of
> 5 different kinds.
> Each kind corresponds to a concrete shard and due to this, the implemented
> client tool avoids searching in all the shards when the users selects just
> one or a few of kinds. The tool runs a multisharded query of the proper
> shards. I guess this is a right approach but correct me if I am wrong.
> The real problem of this architecture is the correlation between concurrent
> users and response time:
> 1 query: n seconds
> 2 queries: 2*n second each query
> 3 queries: 3*n seconds each query
> and so...
> This is being a real headache because 1 single query has an acceptable
> response time but when many users are accessing to the server the
> performance goes hardly down.

View raw message