Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A68E6200C80 for ; Thu, 25 May 2017 10:09:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A52E6160BCA; Thu, 25 May 2017 08:09:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E8ACA160BC7 for ; Thu, 25 May 2017 10:09:07 +0200 (CEST) Received: (qmail 57332 invoked by uid 500); 25 May 2017 08:09:06 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 57317 invoked by uid 99); 25 May 2017 08:09:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 May 2017 08:09:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 81AF418FC88 for ; Thu, 25 May 2017 08:09:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id h2z5o3kb6-eR for ; Thu, 25 May 2017 08:09:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1CBF35F5CA for ; Thu, 25 May 2017 08:09:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 599A6E02C8 for ; Thu, 25 May 2017 08:09:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0A13A21B56 for ; Thu, 25 May 2017 08:09:04 +0000 (UTC) Date: Thu, 25 May 2017 08:09:04 +0000 (UTC) From: "Andrzej Bialecki (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (SOLR-10745) Reliably create nodeAdded / nodeLost events MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 25 May 2017 08:09:08 -0000 Andrzej Bialecki created SOLR-10745: ---------------------------------------- Summary: Reliably create nodeAdded / nodeLost events Key: SOLR-10745 URL: https://issues.apache.org/jira/browse/SOLR-10745 Project: Solr Issue Type: Sub-task Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: master (7.0) When Overseer node goes down then depending on the current phase of trigger execution a {{nodeLost}} event may not have been generated. Similarly, when a new node is added and Overseer goes down before the trigger saves a checkpoint (and before it produces {{nodeAdded}} event) this event may never be generated. The proposed solution would be to modify how nodeLost / nodeAdded information is recorded in the cluster: * new nodes should do a ZK multi-write to both {{/live_nodes}} and additionally to a predefined location eg. {{/autoscaling/nodeAdded/}}. On the first execution of Trigger.run in the new Overseer leader it would check this location for new znodes, which would indicate that node has been added, and then generate a new event and remove the znode that corresponds to the event. * node lost events should also be recorded to a predefined location eg. {{/autoscaling/nodeLost/}}. Writing to this znode would be attempted simultaneously by a few randomly selected nodes to make sure at least one of them succeeds. On the first run of the new trigger instance (in new Overseer leader) event generation would follow the sequence described above. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org