lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gian Maria Ricci - aka Alkampfer <alkamp...@nablasoft.com>
Subject RE: Use multiple istance simultaneously
Date Sat, 12 Dec 2015 10:38:53 GMT
Thanks a lot for all the clarifications.

Actually resources are not a big problem, I think customer can afford 4 GB RAM Red Hat linux
machines for Zookeeper. Solr Machines will have in production 64 or 96 GB of ram, depending
on the dimension of the index.

My primary concern is maintenance of the structure. With single independent machines, the
situation is trivial, we can stop solr on one of the machine during the night, and issue a
full backup of the indexes. With a full backup of the indexes, rebuilding a machine from scratch
in case of disaster is simple, just spin off a new Virtual machine, restore the backup, restart
solr and everything is ok.

If for any reason the SolrCloud cluster stops working, restoring everything is somewhat more
complicated. Are there any best practice for SolrCloud to backup everything so we can restore
the entire cluster if anything goes wrong?

Thanks a lot for the interesting discussion and for the really useful information you gave
me.

--
Gian Maria Ricci
Cell: +39 320 0136949
    

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: venerdì 11 dicembre 2015 17:11
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

On 12/11/2015 8:19 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Thanks for all of your clarification. I know that solrcloud is a 
> really better configuration than any other, but actually it has a 
> complexity that is really higher. I just want to give you the pain 
> point I've noticed while I was gathering all the info I can got on SolrCloud.
> 
> 1) zookeeper documentation says that to have the best experience you 
> should have a dedicated filesystem for the persistence and it should 
> never swap to disk. I've not found any guidelines on how I should 
> dimension zookeeper machine, how much ram, disk? Can I install 
> zookeeper in the same machines where Solr resides ( I suspect no, 
> because Solr machine are under stress and if zookeeper start swapping is can lead to
problem)?

Standalone zookeeper doesn't require much in the way of resources.
Unless the SolrCloud installation is enormous, a machine with 1-2GB of RAM is probably plenty,
if the only thing it is doing is zookeeper and it's not running Windows.  If the SolrCloud
install has a lot of collections, shards, and/or servers, then you might need more, because
the zookeeper database will be larger.

> 2) What about the update? If I need to update my solrcloud instance 
> and the new version requires a new version of zookeeper which is the 
> path to go? I need to first update zookeeper, or upgrading solr to existing machine or?
> Maybe I did not search well but I did not find a comprehensive 
> guideline that told me how to upgrade my SolrCloud installation in various situation.

If you're following recommendations and using standalone zookeeper, then upgrading it is entirely
separate from upgrading Solr.  It's probably a good idea to upgrade your three (or more) zookeeper
servers first.

Here's a FAQ entry from zookeeper about upgrades:

https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6

> 3) Which are the best practices to run DIH in solrcloud? I think I can 
> round robin triggering DIH import on different server composing the 
> cloud infrastructure, or there is a better way to go? (I probably need 
> to trigger a DIH each 5/10 minutes but the number of new records is 
> really small)

When checking the status of an import, you must send the status request to the same machine
where you sent the command to start the import.

If you're only ever going to run one DIH at a time, then I don't see any reason to involve
multiple servers.  If you want to run more than one simultaneously, then you might want to
run each one on a different machine.

> 4) Since I believe that it is not best practice to install zookeeper 
> on same SolrMachine (as separated process, not the built in 
> zookeeper), I need at least three more machine to maintain / monitor / 
> upgrade and I need also to monitor zookeeper, a new appliance that 
> need to be mastered by IT Infrastructure.

The only real reason to avoid zookeeper and Solr on the same machine is performance under
high load, and mostly that comes down to I/O performance, so if you can put zookeeper on a
separate set of disks, you're probably good.  If the query/update load will not be high, then
sharing machines will likely work well, even if the disks are all shared.

> Is there any guidelines on how to automate promoting a slave as a 
> master in classic Master Slave situation? I did not find anything 
> official, because auto promoting a slave into master could solve my problem.

I don't know of any explicit information explaining how to promote a new master.  Basically
what you have to do is reconfigure the new master's replication (so it stops trying to be
a slave), reconfigure every slave to point to the new master, and reconfigure every client
that makes index updates.  DNS changes *might* be able to automate the slave and update client
reconfig, but the master reconfig requires changing Solr's configuration, which at the very
least will require reloading or restarting that server.  That could be automated, but it's
up to you to write the automation.

Thanks,
Shawn


Mime
View raw message