lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] [Created] (SOLR-10745) Reliably create nodeAdded / nodeLost events
Date Thu, 25 May 2017 08:09:04 GMT
Andrzej Bialecki  created SOLR-10745:

             Summary: Reliably create nodeAdded / nodeLost events
                 Key: SOLR-10745
             Project: Solr
          Issue Type: Sub-task
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
            Reporter: Andrzej Bialecki 
            Assignee: Andrzej Bialecki 
             Fix For: master (7.0)

When Overseer node goes down then depending on the current phase of trigger execution a {{nodeLost}}
event may not have been generated. Similarly, when a new node is added and Overseer goes down
before the trigger saves a checkpoint (and before it produces {{nodeAdded}} event) this event
may never be generated.

The proposed solution would be to modify how nodeLost / nodeAdded information is recorded
in the cluster:
* new nodes should do a ZK multi-write to both {{/live_nodes}} and additionally to a predefined
location eg. {{/autoscaling/nodeAdded/<nodeName>}}. On the first execution of
in the new Overseer leader it would check this location for new znodes, which would indicate
that node has been added, and then generate a new event and remove the znode that corresponds
to the event.
* node lost events should also be recorded to a predefined location eg. {{/autoscaling/nodeLost/<nodeName>}}.
Writing to this znode would be attempted simultaneously by a few randomly selected nodes to
make sure at least one of them succeeds. On the first run of the new trigger instance (in
new Overseer leader) event generation would follow the sequence described above.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message