lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica
Date Fri, 15 Jan 2016 16:19:32 GMT
Yeah, and to the original question, there is no master list of features and
how SolrCloud vs. legacy distributed mode compare feature by feature.

And until SolrCloud actually does subsume every single (important) feature
of legacy distributed mode, Solr probably still needs to continue to
support legacy distributed mode, including backup.

The doc does need better coverage of backup and restore at the cluster
level, including configuration files. What's there now is basically the old
single-node replication backup. What exactly is the recommended best
practice for backing up a single shard, let alone all shards. Should
backups be collection-based as well?


-- Jack Krupansky

On Fri, Jan 15, 2016 at 3:26 AM, Gian Maria Ricci - aka Alkampfer <
alkampfer@nablasoft.com> wrote:

> Yes, I've checked that jira some weeks ago and it is the reason why I was
> telling that there is still no clear procedure to backup SolrCloud in
> current latest version.  I'm glad that the priority is Major, but until it
> is not closed in an official version, I have to tell to customers that
> there is not easy and supported backup procedure for SolrCloud
> configuration :(.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: giovedì 14 gennaio 2016 16:46
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> Replica
>
> re: SolrCloud backup/restore:
> https://issues.apache.org/jira/browse/SOLR-5750
>
> not committed yet, but getting attention.
>
>
>
> On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer <
> alkampfer@nablasoft.com> wrote:
> > Actually there are situation where a restore is needed, suppose that
> someone does some error and deletes all documents from a collection, or
> maybe deletes a series of document, etc. I know that this is not likely to
> happen, but in mission critical enterprise system, we always need a
> detailed procedure for disaster recovering.
> >
> > For such scenario we need to plan the worst case, where everything is
> lost.
> >
> > With Master Slave is just a matter of recreating machines, reconfigure
> the core, and restore a backup, and the game is done, with SolrCloud is not
> really clear for me how can I backup / restore data. From what I've found
> in the internet I need to backup every shard of the collection, and, if we
> need to restore everything from a backup, we can recreate the collection
> and then restore all the individual shards. I do not know if this is a
> supported scenario / procedure, but theoretically it could work.
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> >
> >
> > -----Original Message-----
> > From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> > Sent: giovedì 14 gennaio 2016 10:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> > Replica
> >
> > It's true that SolrCloud is adding some complexity.
> > But few observations :
> >
> > SolrCloud has some disadvantages and c an't beat the easiness and
> > simpleness
> >> of
> >> Master Slave Replica. So I can only encourage to keep Master Slave
> >> Replica in future versions.
> >
> >
> > I agree, it can happen situations when you have really simple and not
> critical systems.
> > Anyway old style replication is still used in SolrCloud, so I think it
> is going to stay for a while ( until is replaced with something else) .
> >
> > To answer to Gian :
> >
> > One of the problem I've found is that I've not found a simple way to
> > backup
> >> the content of a collection to restore in situation of disaster
> recovery.
> >> With simple master / slave scenario we can use the replication
> >> handler to generate backups that can be easily used to restore
> >> content of a core, while with SolrCloud is not clear how can we
> >> obtain a full backup
> >
> >
> > To be fair, Disaster recovery is when SolrCloud shines.
> > If you lose random nodes across your collection, you simply need to fix
> them and spin up again .
> > The system will automatically restore the content to the last version
> availa ble ( the tlog first and the  leader ( if the tlog is not enough)
> will help the dead node to catch up .
> > If you lose all the replicas for a shard and you lose the content in
> disk of all this replicas ( index and tlog), SolrCloud can't help you.
> > For this unlikely scenarios a backup is suggested.
> > You could restore anyway the backup only to one node, and the replicas
> are going to catch up .
> >
> > Probably is just a matter of backupping every shard with standard
> >> replication handler and then restore each shard after recreating the
> >> collection
> >
> >
> > Definitely not, SolrCloud is there to avoid this manual stuff.
> >
> > Cheers
> >
> >
> > On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer <
> alkampfer@nablasoft.com> wrote:
> >
> >> I agree that SolrCloud has not only advantages, I really understand
> >> that it offers many more features, but it introduces some complexity.
> >>
> >> One of the problem I've found is that I've not found a simple way to
> >> backup the content of a collection to restore in situation of disaste
> > r
> >> recovery. With simple master / slave scenario we can use the
> >> replication handler to generate backups that can be easily used to
> >> restore content of a core, while with SolrCloud is not clear how can we
> obtain a full backup.
> >> Probably is just a matter of backupping every shard with standard
> >> replication handler and then restore each shard after recreating the
> >> collection, but I've not found (probably I need to search better)
> >> official documentation on backup / restore procedures for SolrCloud.
> >>
> >> Thanks.
> >>
> >> --
> >> Gian Maria Ricci
> >> Cell: +39 320 0136949
> >>
> >>
> >> -----Original Message-----
> >> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> >> Sent: giovedì 14 gennaio 2016 08:22
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> >> Slave Replica
> >>
> >> SolrCloud has some disadvantages and can't beat the easiness and
> >> simpleness of Master Slave Replica. So I can only encourage to keep
> >> Master Slave Replica in
> > future versions.
> >>
> >> Bernd
> >>
> >> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> >> > The "Legacy Scaling and Distribution" section of the Solr Reference
> >> > Guide also gives info elated to so-called master-slave mode:
> >> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and
> >> > +
> >> > Di
> >> > stribution
> >> >
> >> > Also, although the old master-slave mode is still technically
> >> > supported in the sense that the code and doc is still there, You
> >> > won't be able to get the level of community support  here on the
> >> > mailing list as you can get for SolrCloud.
> >> >
> >> > Unless you're simply trying to decide whether to leave an old
> >> > legacy system as-is with the old distributed mode, nobody should be
> >> > considered a fresh new distributed Solr deployment with anything
> >> > other
> >> than SolrCloud.
> >> >
> >> > (Hmmm... have any of the committers considered deprecating the old
> >> > non-SolrCloud distributed mode features?)
> >>
> >> -1
> >>
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> > aji Dutta
> >> > <sdutta@hortonworks.com>
> >> > wrote:
> >> >
> >> >> - SolrCloud uses zookeeper to manage HA
> >> >>         - Zookeeper is a standard for all HA in Apache Hadoop
> >> >> - You have collections which will manage your shards across nodes
> >> >> - SolrJ Client is now fault tolerant with CloudSolrClient
> >> >>
> >> >> This is the way future direction of the product will go.
> >> >>
> >> >>
> >> >>
> >> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> >> >> <alkampfer@nablasoft.com> wrote:
> >> >>
> >> >>> Thanks.
> >> >>>
> >> >>> --
> >> >>> Gian Maria Ricci
> >> >>> Cell: +39 320 0136949
> >> >>>
> >> >>>
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
> >> >>> Sent: lunedì 11 gennaio 2016 18:28
> >> >>> To: solr-user@lucene.apache.org
> >> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> >> >>> Slave Replica
> >> >>>
> >> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >> >>>> a customer need a comprehensive list of all pro and cons of
> >> >>>> using
> >
> >> >>>> standard Master Slave replica VS using Solr Cloud. I¹m
> >> >>>> interested especially in query performance consideration,
> >> >>>> because in this specific situation the rate of new documents
is
> >> >>>> really slow, but the amount of data is about 50 millions of
> >> >>>> document, and the index size on disk for single core is about
30
> GB.
> >> >>>
> >> >>> The primary advantage to SolrCloud is that SolrCloud handles most
> >> >>> of the administrative and operational details for you automatically.
> >> >>>
> >> >>> SolrCloud is a little more complicated to set up initially,
> >> >>> because you must worry about Zookeeper as well as Solr, but once
> >> >>> it's properly set up, there is no single point of failure.
> >> >>>
> >> >>>> Such amount of data should be easily handled by a Master Slave
> >> >>>> replica with a  single core replicated on a certain number
of
> >> >>>> slaves, but we need to evaluate also the option of SolrCloud,
> >> >>>> especially for fault tolerance.
> >> >>>>
> >> >>>
> >> >>> Once you're beyond in
> > itial setup, fault tolerance with SolrCloud is
> >> >>> much easier than master/slave replication.  Switching a slave to
> >> >>> a master is possible, but the procedure is somewhat complicated.
> >> >>> SolrCloud does not
> >> >>> *have* masters, it is a true cluster.
> >> >>>
> >> >>> With master/slave replication, the master handles all indexing,
> >> >>> and the finished index segments are copied to the slaves via
> >> >>> HTTP, and the slaves simply need to open them.  SolrCloud does
> >> >>> indexing on all shard replicas, nearly simultaneously.  Usually
> >> >>> this is an advantage, not a disadvantage, but in heavy indexing
> >> >>> situations master/slave replication
> >> >>> *might* show better performance on the slaves.
> >> >>>
> >> >>> Thanks,
> >> >>> Shawn
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symm
> > etry?"
> >
> > William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message