lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Replication and soft commits for NRT searches
Date Wed, 14 Oct 2015 15:27:07 GMT
bq: If a timeout between shard leader and replica can
lead to a smaller rf value (because replication has
timed out), is it possible to increase this timeout in the configuration?

Why do you care? If it timed out, then the follower will
no longer be active and will not serve queries. The Cloud view
should show it in "down", "recovery" or the like. Before it
goes back to the "active" state, it will synchronize from
the leader automatically without you having to do anything and
any docs that were indexed to the leader will be faithfully
reflected on the follower  _before_ the recovering
follower serves any new queries. So practically it makes no
difference whether there was an update timeout or not.

This is feeling a lot like an "XY" problem. You're asking detailed
questions about "X" (in this case timeouts, what rf means and the like)
without telling us what the problem you're concerned about is ("Y").

So please back up and tell us what your higher level concern is.
Do you have any evidence of Bad Things Happening?

And do, please, change your commit intervals to not happen after
doc. That's a Really Bad Practice in Solr.

Best,
Erick

On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
<martin.mois@morpho.com> wrote:
> Hello,
>
> thank you for the detailed answer.
>
> If a timeout between shard leader and replica can lead to a smaller rf value (because
replication has timed out), is it possible to increase this timeout in the configuration?
>
> Best Regards,
> Martin Mois
>
> Comments inline:
>
> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
> <martin.mois@morpho.com> wrote:
>> Hello,
>>
>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been created
with
> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=10000
> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>>
>> As far as I understand from section "Write Side Fault Tolerance" in the documentation
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
I
> cannot enforce that an update gets replicated to all replicas, but I can only get the
achieved
> replication factor by requesting the return value rf.
>>
>> My question is now, what exactly does rf=2 mean? Does it only mean that the replica
has
> written the update to its transaction log? Or has the replica also performed the soft
commit
> as configured with autoSoftCommits/maxDocs=1? The answer is important for me, as if the
update
> would only get written to the transaction log, I could not search for it reliable, as
the
> replica may not have added it to the searchable index.
>
> rf=2 means that the update was successfully replicated to and
> acknowledged by two replicas (including the leader). The rf only deals
> with the durability of the update and has no relation to visibility of
> the update to searchers. The auto(soft)commit settings are applied
> asynchronously and do not block an update request.
>
>>
>> My second question is, does rf=1 mean that the update was definitely not successful
on
> the replica or could it also represent a timeout of the replication request from the
shard
> leader? If it could also represent a timeout, then there would be a small chance that
the
> replication was successfully despite of the timeout.
>
> Well, rf=1 implies that the update was only applied on the leader's
> index + tlog and either replicas weren't available or returned an
> error or the request timed out. So yes, you are right that it can
> represent a timeout and as such there is a chance that the replication
> was indeed successful despite of the timeout.
>
>>
>> Is there a way to retrieve the replication factor for a specific document after the
update
> in order to check if replication was successful in the meantime?
>>
>
> No, there is no way to do that.
>
>> Thanks in advance.
>>
>> Best Regards,
>> Martin Mois
>> #
>> " This e-mail and any attached documents may contain confidential or proprietary
information.
> If you are not the intended recipient, you are notified that any dissemination, copying
of
> this e-mail and any attachments thereto or use of their contents by any means whatsoever
is
> strictly prohibited. If you have received this e-mail in error, please advise the sender
immediately
> and delete this e-mail and all attached documents from your computer system."
>> #
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> #
> " This e-mail and any attached documents may contain confidential or proprietary information.
If you are not the intended recipient, you are notified that any dissemination, copying of
this e-mail and any attachments thereto or use of their contents by any means whatsoever is
strictly prohibited. If you have received this e-mail in error, please advise the sender immediately
and delete this e-mail and all attached documents from your computer system."
> #

Mime
View raw message