lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kuiper <matt.kui...@issinc.com>
Subject RE: How to make SolrCloud more elastic
Date Thu, 12 Feb 2015 19:09:50 GMT
Otis,

Thanks for your reply.  I see your point about too many shards and search efficiency.  I also
agree that I need to get a better handle on customer requirements and expected loads.  

Initially I figured that with the shard splitting option, I would need to double my Solr nodes
every time I split (as I would want to split every shard within the collection).  Where actually
only the number of shards would double, and then I would have the opportunity to rebalance
the shards over the existing Solr nodes plus a number of new nodes that make sense at the
time.  This may be preferable to defining many micro shards up front.

The time-base collections may be an option for this project.  I am not familiar with query
routing, can you point me to any documentation on how this might be implemented?

Thanks,
Matt

-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Sent: Wednesday, February 11, 2015 9:13 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make SolrCloud more elastic

Hi Matt,

You could create extra shards up front, but if your queries are fanned out to all of them,
you can run into situations where there are too many concurrent queries per node causing lots
of content switching and ultimately being less efficient than if you had fewer shards.  So
while this is an approach to take, I'd personally first try to run tests to see how much a
single node can handle in terms of volume, expected query rates, and target latency, and then
use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you
start approaching the target limits you are ready with additional nodes and shard splitting
if needed.

Of course, if your data and queries are such that newer documents are queries   more, you
should look into time-based collections... and if your queries can only query a subset of
data you should look into query routing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch
Support * http://sematext.com/


On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper <matt.kuiper@issinc.com> wrote:

> I am starting a new project and one of the requirements is that Solr 
> must scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by 
> adding more replicas.
>
> I am more concerned about handling a growing index size.  I have 
> already been given some good input on this topic and am considering a 
> shard splitting approach, but am more focused on a rebalancing 
> approach that includes defining many shards up front and then moving 
> these existing shards on to new Solr servers as needed.  Plan to 
> experiment with this approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings 
> on these approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt
>
Mime
View raw message