lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: tipping point for using solrcloud—or not?
Date Mon, 02 Oct 2017 09:02:01 GMT
Hi John,
Your data volume does not require SolrCloud, especially if you isolate core that is related
to your business from other cores. You mentioned that the second largest is logs core used
for analytics - not sure what sort of logs, but if write intensive logging, you might want
to isolate those. It is probably better to have two 15GB instances than one 30GB and dedicate
one instance to your main core. If you do not see the size going up in the near future, you
can go with even smaller one. It may also be better to invest some money into instances with
SSD. You may consider sending logs to some centralised logging solutions (one such is out
Logsene http://sematext.com/logsene <http://sematext.com/logsene> ).
When it comes to FT, you can still have it with MS model by introducing slaves. That can also
be one way to isolate cores that your users are facing - they will query only slaves and the
only replicated core will be the main core.
It is hard to tell more without knowing your ingestion/query rate, query types, NRT requirements…

HTH,
Emir

> On 29 Sep 2017, at 17:27, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> SolrCloud. SolrCloud. SolrCloud.
> 
> Well, it actually depends. I recommend people go to SolrCloud when any
> of the following apply:
> 
>> The instant you need to break any collection up into shards because you're running
into the constraints of your hardware (you can't just keep adding memory to the JVM forever).
> 
>> You need NRT searching and need multiple replicas for either your traffic rate or
HA purposes.
> 
>> You find yourself dealing with lots of administrative complexity for various indexes.
You have what sounds like 6-10 cores laying around. You can move them to different machines
without going to SolrCloud, but then something has to keep track of where they all are and
route requests appropriately. If that gets onerous, SolrCloud will simplify it.
> 
> If none of the above apply, master/slave is just fine. Since you can
> rebuild in a couple of hours, most of the difficulty with M/S when the
> master goes down are manageable. With a master and several slaves, you
> have HA, and a load balancer will see to it that some are used.
> There's no real need to exclusively search on the slaves, I've seen
> situations where the master is used for queries as well as indexing.
> 
> To increase your query rate, you can just add more slaves to the hot
> index, assuming you're content with the latency between indexing and
> being able to search newly indexed documents.
> 
> SolrCloud, of course, comes with the added complexity of ZooKeeper.
> 
> Best,
> Erick
> 
> 
> 
> On Fri, Sep 29, 2017 at 5:34 AM, John Blythe <johnblythe@gmail.com> wrote:
>> hi all.
>> 
>> complete noob as to solrcloud here. almost-non-noob on solr in general.
>> 
>> we're experiencing growing pains in our data and am thinking through moving
>> to solrcloud as a result. i'm hoping to find out if it seems like a good
>> strategy or if we need to get other areas of interest handled first before
>> introducing new complexities.
>> 
>> here's a rundown of things:
>> - we are on a 30g ram aws instance
>> - we have ~30g tucked away in the ../solr/server/ dir
>> - our largest core is 6.8g w/ ~25 segments at any given time. this is also
>> the core that our business directly runs off of, users interact with, etc.
>> - 5g is for a logs type of dataset that analytics can be built off of to
>> help inform the primary core above
>> - 3g are taken up by 3 different third party sources that we use solr to
>> warehouse and have available for query for the sake of linking items in our
>> primary core to these cores for data enrichment
>> - several others take up < 1g each
>> - and then we have dev- and demo- flavors for some of these
>> 
>> we had been operating on a 16gb machine till a few weeks ago (actually
>> bumped while at lucene revolution bc i hadn't noticed how much we'd
>> outgrown the cache size's needs till the week before!). the load when doing
>> an import or running our heavier operations is much better and doesn't fall
>> under the weight of the operations like it had been doing.
>> 
>> we have no master/slave replica. all of our data is 'replicated' by the
>> fact that it exists in mysql. if solr were to go down it'd be a nice big
>> fire but one we could recover from within a couple hours by simply
>> reimporting.
>> 
>> i'd like to have a more sophisticated set up in place for fault tolerance
>> than that, of course. i'd also like to see our heavy, many-query based
>> operations be speedier and better capable of handling multi-threaded runs
>> at once w/ ease.
>> 
>> is this a matter of getting still more ram on the machine? cpus for faster
>> processing? splitting up the read/write operations between master/slave?
>> going full steam into a solrcloud configuration?
>> 
>> one more note. per discussion at the conference i'm combing through our
>> configs to make sure we trim any fat we can. also wanting to get
>> optimization scheduled more regularly to help out w segmentation and
>> garbage heap. not sure how far those two alone will get us, though.
>> 
>> thanks for any thoughts!
>> 
>> --
>> John Blythe


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message