lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Steffensen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-3721) Multiple concurrent recoveries of same shard?
Date Thu, 16 Aug 2012 17:55:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436154#comment-13436154
] 

Per Steffensen edited comment on SOLR-3721 at 8/17/12 4:53 AM:
---------------------------------------------------------------

{quote} No recovery will be started if a node cannot talk to zookeeper. {quote}

Well, I knew that. I meant that the two Solrs where disconnected from ZK at the same time,
but of course both got their connection reestablished - after session timeout (believe (kinda
hope) that a session timeout has to have happened before Solr needs to go into recovery after
a ZK connection loss)

When it gets prioritized on my side, I will try to investigate further what causes the log
to claim that many recoveries goes on for the same shard concurrently.

{quote} When a new node is elected as a leader by ZooKeeper it first tries to do a peer sync
against every other live node. So lets say the first node in your two node situation comes
back and he is behind the other node, but he comes back first and is elected leader. The second
node has the latest updates, but is second in line to be leader and a few updates ahead. The
potential leader will try and peer sync with the other node and get those missing updates
if it's fewer than 100 or fail because the other node is ahead by too much. {quote}

Well we shouldnt let this issue (SOLR-3721) become about many other issues, but when the "behind"
node has reconnected and become leader and the one with the latest updates does not come back
live right away, isnt the new leader (which is behind) allowed to start handling update-requests.
If yes, then it will be possible that both shards have documents/updates that the other one
doesnt, and it is possible to come up with scenarios where there is no good algorithm for
generating the "correct" merged union of the data in both shards. So what to do when the other
shard (which used to have a later version than the current leader) comes live? Believe there
is nothing solid to do!

How to avoid that? I was thinking about keeping the latest version for every slice in ZK,
so that a "behind" shard will know if it has the latest version of a slice, and therefore
if it is allowed to take the role as leader. Of course the writing of this "latest version"
to ZK and the writing of the corresponding update in leaders transaction-log would have to
be atomic (like the A in ACID) as much as possible. And it would be nice if writing of the
update in replica transaction-log would also be atomic with the leader-writing and the ZK
writing, in order to increase the chance that a replica is actually allowed to take over the
leader role if the leader dies (or both dies and replica comes back first). But all that is
just an idea on top of my head. 
Do you already have a solution implemented or a solution on the drawing board or how do you/we
prevent such a problem? As far as I understand "the drill" during leader-election/recovery
(whether its peer-sync or file-copy-replication) from the little code-reading I have done
and from what you explain, there is not a current solution. But I might be wrong?

{quote} The other node, being the next in line to be leader, will now try and peer sync with
the other nodes in the shard {quote}

Guess/hope you mean "...with the other shards (running on different nodes) in the slice".
As I understand Solr terminology a logical chunk of the "entire data" (a collection in Solr)
is a "slice", and the data in a slice might physically exist more than one place (in more
shards - if replication is used). Back when I started my interest in Solr I used a considerable
amount of time understanding Solr terminology - mainly because it is different that what I
have been used to (in my pre-Solr-world a "shard" is what you call a "slice") - so now please
dont tell me that I misunderstood :-)

{quote} That is the current process - we will continually be hardening and improving it I'm
sure. {quote}

I will probably stick around for that. The correctness and robusteness of this live-replication
feature is (currently) very important to us.

                
      was (Author: steff1193):
    {quote} No recovery will be started if a node cannot talk to zookeeper. {quote}

Well, I knew that. I meant that the two Solrs where disconnected from ZK at the same time,
but of course both got their connection reestablished - after session timeout (believe (kinda
hope) that a session timeout has to have happened before Solr needs to go into recovery after
a ZK connection loss)

When it gets prioritized on my side, I will try to investigate further what causes the log
to claim that many recoveries goes on for the same shard concurrently.

{quote} When a new node is elected as a leader by ZooKeeper it first tries to do a peer sync
against every other live node. So lets say the first node in your two node situation comes
back and he is behind the other node, but he comes back first and is elected leader. The second
node has the latest updates, but is second in line to be leader and a few updates ahead. The
potential leader will try and peer sync with the other node and get those missing updates
if it's fewer than 100 or fail because the other node is ahead by too much. {quote}

Well we shouldnt let this issue (SOLR-3721) become about many other issues, but when the "behind"
node has reconnected and become leader and the one with the latest updates does not come back
live right away, isnt the new leader (which is behind) allowed to start handling update-requests.
If yes, then it will be possible that both shards have documents/updates that the other one
doesnt, and it is possible to come up with scenarios where there is no good algorithm for
generating the "correct" merged union of the data in both shards. So what to do when the other
shard (which used to have a later version than the current leader) comes live? Believe there
is nothing solid to do!

How to avoid that? I was thinking about keeping the latest version for every slice in ZK,
so that a "behind" shard will know if it has the latest version of a slice, and therefore
if it is allowed to take the role as leader. Of course the writing of this "latest version"
to ZK and the writing of the corresponding update in leaders transaction-log would have to
be atomic (like the A in ACID) as much as possible. And it would be nice if writing of the
update in replica transaction-log would also be atomic with the leader-writing and the ZK
writing, in order to increase the chance that a replica is actually allowed to take over the
leader role if the leader dies (or both dies and replica comes back first). But all that is
just an idea on top of my head. 
Do you already have a solution implemented or a solution on the drawing board or how do you/we
prevent such a problem? As far as I understand "the drill" during leader-election/recovery
(whether its peer-sync or file-copy-replication) from the little code-reading I have done
and from what you explain, there is not a current solution. But I might be wrong?

{quote} The other node, being the next in line to be leader, will now try and peer sync with
the other nodes in the shard {quote}

Guess/hope you mean "...with the other shards (running on different nodes) in the slice".
As I understand Solr terminology a logical chunk of the "entire data" (a collection in Solr)
is a "slice", and the data in a slice might physically exist more than one place (in more
shards - if replication is used). Back when I started my interest in Solr I used a considerable
amount of time understanding Solr terminology - mainly because it is different that what I
have been used to (in my pre-Solr-world a "shard" is what you call a "slice") - so now please
dont tell me that I misunderstood :-)

{qoute} That is the current process - we will continually be hardening and improving it I'm
sure. {quote}

I will probably stick around for that. The correctness and robusteness of this live-replication
feature is (currently) very important to us.

                  
> Multiple concurrent recoveries of same shard?
> ---------------------------------------------
>
>                 Key: SOLR-3721
>                 URL: https://issues.apache.org/jira/browse/SOLR-3721
>             Project: Solr
>          Issue Type: Bug
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Using our own Solr release based on Apache revision 1355667 from
4.x branch. Our changes to the Solr version is our solutions to TLT-3178 etc., and should
have no effect on this issue.
>            Reporter: Per Steffensen
>              Labels: concurrency, multicore, recovery, solrcloud
>             Fix For: 4.0
>
>         Attachments: recovery_in_progress.png, recovery_start_finish.log
>
>
> We run a performance/endurance test on a 7 Solr instance SolrCloud setup and eventually
Solrs lose ZK connections and go into recovery. BTW the recovery often does not ever succeed,
but we are looking into that. While doing that I noticed that, according to logs, multiple
recoveries are in progress at the same time for the same shard. That cannot be intended and
I can certainly imagine that it will cause some problems.
> It is just the logs that are wrong, did I make some mistake, or is this a real bug?
> See attached grep from log, grepping only on "Finished recovery" and "Starting recovery"
logs.
> {code}
> grep -B 1 "Finished recovery\|Starting recovery" solr9.log solr8.log solr7.log solr6.log
solr5.log solr4.log solr3.log solr2.log solr1.log solr0.log > recovery_start_finish.log
> {code}
> It can be hard to get an overview of the log, but I have generated a graph showing (based
alone on "Started recovery" and "Finished recovery" logs) how many recoveries are in progress
at any time for the different shards. See attached recovery_in_progress.png. The graph is
also a little hard to get an overview of (due to the many shards) but it is clear that for
several shards there are multiple recoveries going on at the same time, and that several recoveries
never succeed.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message