From dev-return-354834-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Fri May 3 11:25:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 386D018064D for ; Fri, 3 May 2019 13:25:02 +0200 (CEST) Received: (qmail 43142 invoked by uid 500); 3 May 2019 11:25:01 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 43132 invoked by uid 99); 3 May 2019 11:25:00 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 May 2019 11:25:00 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4F98BE02EC for ; Fri, 3 May 2019 11:25:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 07EE724597 for ; Fri, 3 May 2019 11:25:00 +0000 (UTC) Date: Fri, 3 May 2019 11:25:00 +0000 (UTC) From: "Cao Manh Dat (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832436#comment-16832436 ] Cao Manh Dat edited comment on SOLR-13445 at 5/3/19 11:24 AM: -------------------------------------------------------------- I had several private conversations with [~shalinmangar] about how to deal with this issue, and he helped a lot. Thanks [~shalinmangar]. The attached patch beside implementing mentioned features in the description, also solving an issue in {{SolrClientNodeStateProvider}} since we always retrying query metrics from other nodes even it just successfully done that. [~shalinmangar] can you take a look at the attached patch? was (Author: caomanhdat): I had several private conversations with [~shalinmangar] about how to deal with this issue, and he helped a lot. Thanks [~shalinmangar]. The attached patch beside implementing mentioned features in the description, also solving an issue in {{SolrClientNodeStateProvider}} since we always retrying query metrics from other nodes even it just successfully doing that. > Preferred replicas on nodes with same system properties as the query master > --------------------------------------------------------------------------- > > Key: SOLR-13445 > URL: https://issues.apache.org/jira/browse/SOLR-13445 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Assignee: Cao Manh Dat > Priority: Major > Attachments: SOLR-13445.patch > > > Currently, Solr chooses a random replica for each shard to fan out the query request. However, this presents a problem when running Solr in multiple availability zones. > If one availability zone fails then it affects all Solr nodes because they will try to connect to Solr nodes in the failed availability zone until the request times out. This can lead to a build up of threads on each Solr node until the node goes out of memory. This results in a cascading failure. > This issue try to solve this problem by adding > * another shardPreference param named {{node.sysprop}}, so the query will be routed to nodes with same defined system properties as the current one. > * default shardPreferences on the whole cluster, which will be stored in {{/clusterprops.json}}. > * a cacher for fetching other nodes system properties whenever /live_nodes get changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org