lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2358) Distributing Indexing
Date Sun, 09 Oct 2011 15:13:29 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123702#comment-13123702
] 

Mark Miller commented on SOLR-2358:
-----------------------------------

Initially, a request will be fully synchronous and will not return success to the client until
the request is sent to each replica. So if a leader goes down before all replicas receive
and ACK the request, the client will not get an ACK. A new leader will be elected. When the
downed, previous leader comes back, he will come up in recovery mode. I expect recovery to
be a difficult part and we have not fully worked it out yet. To recover, the node will have
to talk to the leader and figure out what it has that it should not, what it doesn't have,
etc. Then the recovering node either receives replays, or replaces the entire index. Lot's
of details to work out here. 

You have an interesting problem in that some replica leader candidates may have an update
while others don't, as the leader may have died in the middle of relaying requests. We might
prefer a new leader with the greatest versioned doc? Most client retries in this case will
be fine (global unique id's are required, so no worry about dupes). Then replicas talk to
the leader and sync up. Or when a new leader is elected, replicas just talk amongst each other
and sync up, or…

If the leader fails right before sending an ACK, the client will likely repeat the request.
In the case of doc adds/updates and the same id it will just replace the previous success
or will be able to use optimistic locking to figure out that either its update or someone
else's actually went through already. The client would already know that perhaps its update
went through because the connection would have timed out rather than receive a failure.

Eventually, we might consider a mode where the request is ACK'd before it's on all replicas,
in which case you might accept a higher risk of data loss.

bq. indexes diverge because some replicas commit a change while others do not

It's an area we have not fully worked out (though Yonik has likely thought about a lot of
this more than I have yet) - initially though, Yonik's point was that you can usually expect
success on all nodes unless the issue is something that would require the node come down and
then come back in recovery mode I think. We certainly want to be resilient here eventually
though. As we work through recovery scenarios, I think this will become more clear.

Long, short, we have been discussing and thinking about these various scenarios, but largely
we are also taking things an issue at a time.

                
> Distributing Indexing
> ---------------------
>
>                 Key: SOLR-2358
>                 URL: https://issues.apache.org/jira/browse/SOLR-2358
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud, update
>            Reporter: William Mayor
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2358.patch
>
>
> The first steps towards creating distributed indexing functionality in Solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message