From solr-user-return-145505-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu Dec 20 16:58:39 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 30D4E180648 for ; Thu, 20 Dec 2018 16:58:39 +0100 (CET) Received: (qmail 7294 invoked by uid 500); 20 Dec 2018 15:58:37 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 7282 invoked by uid 99); 20 Dec 2018 15:58:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Dec 2018 15:58:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3C703C0158 for ; Thu, 20 Dec 2018 15:58:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.111 X-Spam-Level: * X-Spam-Status: No, score=1.111 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qUcpTXtedRmP for ; Thu, 20 Dec 2018 15:58:34 +0000 (UTC) Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E766360FDE for ; Thu, 20 Dec 2018 15:58:33 +0000 (UTC) Received: by mail-lf1-f65.google.com with SMTP id p6so1756015lfc.1 for ; Thu, 20 Dec 2018 07:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=JJK5EayJodwMsuxCvC5YgTm9zDaAI2TiWWHud51/D4I=; b=sXSPLCvL/+wp8PGcrE6YWL+sK7tARbdaukOYhCZe1KWTyvnUcCTvA1n7cGgc26FTuS 4vRvp5AqqVWIN/5CXROgtRpfgzgDUzwLdNUyH1eVMcqaECBhrYreAjnBQKRA9M3uwqUG +psZP93scP5SOhSt98uirVGdvrDprrD1kXbXJ8OtMbzEU4wt7sNek6gS5frHJW+PWg3U T/ap0vYWMPmzpvSpLjEqPAXRfk/1qHJBDou/PMMPt037PVeqOXh5xunkYN18uNT9sJQV YameAI5YdBGMHSnqpDTUkl04fpvZE5o6OkZtkVCxg9msTo5Qyc4kHnFIPeO5QOBzJpUy q7Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=JJK5EayJodwMsuxCvC5YgTm9zDaAI2TiWWHud51/D4I=; b=mSuFYq3XrnTovQNH3ar1G+EHBhSebI81Q/n20SHZQiOOg6krBpItKaidB1UO6Q7oz7 gnBAJe2hGwbX48rgRspOMB1z+qhm25j30yIdxeG2GIyzke/kSpVfNCt7zyKrP1CVq2PO X3dbxWqC9omww0mdUFglEQmt38Tc+5VYCrQ3gds3VM+mdokpZslIk6yZDvWY36PaVSFG eETqdIFYw6ct0qXQ/G/K3zwLKb9ao+AuOtgA8Ggd8Q+N61eTJvwpPNRL75N7E/aCsdsE XU26ealg3PhVdJzQIFgBsFtmi2U7xwqwhPA52ueGVH5g/qcVWUGO3ydbEbj5r+wg7OVw t2ug== X-Gm-Message-State: AA+aEWZWtnm44de4sqmtqN3cz7i7dNe1D2C1pns2Xvkqq3TSsSN7Tvki nN2XQycjt4Po2yh8mpu8ugFX1sd6rk5HNMYZqNPwrSZt X-Google-Smtp-Source: AFSGD/X15PvpMcoFni9C0+bOWMyYavpCxniYmDaBr+T193mmnGO4BZdmNAyPWw2cjPhLrv9p6QY/pQK6EDIyLkXIagc= X-Received: by 2002:a19:4948:: with SMTP id l8mr16051886lfj.156.1545321506441; Thu, 20 Dec 2018 07:58:26 -0800 (PST) MIME-Version: 1.0 References: <7f88de6e-a54b-3888-3e91-d16c04cf10b1@uni-bielefeld.de> <23286_1543330055_wARElXkM010239_063e01d48660$1a9d5d10$4fd81730$@spb.ntk-intourist.ru> <15526_1544110756_wB6FdEe6004138_037201d48d78$b30072d0$19015870$@spb.ntk-intourist.ru> <751e24a6-73e6-6d37-faa8-756ef79887ab@uni-bielefeld.de> <003601d48e3f$6362f320$2a28d960$@spb.ntk-intourist.ru> <0a2501d49857$85befcd0$913cf670$@spb.ntk-intourist.ru> In-Reply-To: <0a2501d49857$85befcd0$913cf670$@spb.ntk-intourist.ru> From: Erick Erickson Date: Thu, 20 Dec 2018 07:57:49 -0800 Message-ID: Subject: Re: REBALANCELEADERS is not reliable To: solr-user Content-Type: text/plain; charset="UTF-8" You can go here: https://issues.apache.org/jira, create a signon and freely create JIRAs. Please attach the patch as well. I hadn't really thought very carefully about REBALANCELEADERS and the new replica types, but that does change the use-case. Best, Erick On Thu, Dec 20, 2018 at 3:31 AM Vadim Ivanov wrote: > > Yes! It works! > I have tested RebalanceLeaders today with the patch provided by Endika Posadas. (http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html) > And at last it works as expected on my collection with 5 nodes and about 400 shards. > Original patch was slightly incompatible with 7.6.0 > I hope this patch will help to try this feature with 7.6 > https://drive.google.com/file/d/19z_MPjxItGyghTjXr6zTCVsiSJg1tN20 > > RebalanceLeaders was not very useful feature before 7.0 (as all replicas were NRT) > But new replica types made it very helpful to keep big clusters in order... > > I wonder, why there is no any jira about this case (or maybe I missed it)? > Anyone who cares, please, help to create jira and improve this feature in the nearest releaase > -- > Vadim > > > -----Original Message----- > > From: Vadim Ivanov [mailto:vadim.ivanov@spb.ntk-intourist.ru] > > Sent: Friday, December 07, 2018 6:13 PM > > To: solr-user@lucene.apache.org > > Subject: RE: REBALANCELEADERS is not reliable > > > > I'm waiting for 7.6 or 7.5.1 and plan to apply patch from Endika Posadas to it. > > Then test again and hope it'll help > > -- > > Vadim > > > > > > > -----Original Message----- > > > From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de] > > > Sent: Friday, December 07, 2018 12:01 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: REBALANCELEADERS is not reliable > > > > > > Thanks for looking this up. > > > It could be a hint where to jump into the code. > > > I wonder why they rejected a jira ticket about this problem? > > > > > > Regards, Bernd > > > > > > Am 06.12.18 um 16:31 schrieb Vadim Ivanov: > > > > Is solr-dev forum I came across this post > > > > http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node- > > > deleted-when-rebalancing-leaders-td4417040.html > > > > May be it will shed some light? > > > > > > > > > > > >> -----Original Message----- > > > >> From: Atita Arora [mailto:atitaarora@gmail.com] > > > >> Sent: Thursday, November 29, 2018 11:03 PM > > > >> To: solr-user@lucene.apache.org > > > >> Subject: Re: REBALANCELEADERS is not reliable > > > >> > > > >> Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well, > > > >> even with the preferredLeader property as recommended in the > > > >> documentation. > > > >> I handled it with a little hack but certainly this dint work as expected. > > > >> I can provide more details if there's a ticket. > > > >> > > > >> On Thu, Nov 29, 2018 at 8:42 PM Aman Tandon > > > >> wrote: > > > >> > > > >>> ++ correction > > > >>> > > > >>> On Fri, Nov 30, 2018, 01:10 Aman Tandon > > >> wrote: > > > >>> > > > >>>> For me today, I deleted the leader replica of one of the two shard > > > >>>> collection. Then other replicas of that shard wasn't getting elected for > > > >>>> leader. > > > >>>> > > > >>>> After waiting for long tried the setting addreplicaprop preferred leader > > > >>>> on one of the replica then tried FORCELEADER but no luck. Then also > > > tried > > > >>>> rebalance but no help. Finally have to recreate the whole collection. > > > >>>> > > > >>>> Not sure what was the issue but both FORCELEADER AND > > REBALANCING > > > >> didn't > > > >>>> work if there was no leader however preferred leader property was > > > setted. > > > >>>> > > > >>>> On Wed, Nov 28, 2018, 12:54 Bernd Fehling < > > > >>> bernd.fehling@uni-bielefeld.de > > > >>>> wrote: > > > >>>> > > > >>>>> Hi Vadim, > > > >>>>> > > > >>>>> thanks for confirming. > > > >>>>> So it seems to be a general problem with Solr 6.x, 7.x and might > > > >>>>> be still there in the most recent versions. > > > >>>>> > > > >>>>> But where to start to debug this problem, is it something not > > > >>>>> correctly stored in zookeeper or is overseer the problem? > > > >>>>> > > > >>>>> I was also reading something about a "leader queue" where possible > > > >>>>> leaders have to be requeued or something similar. > > > >>>>> > > > >>>>> May be I should try to get a situation where a "locked" core > > > >>>>> is on the overseer and then connect the debugger to it and step > > > >>>>> through it. > > > >>>>> Peeking and poking around, like old Commodore 64 days :-) > > > >>>>> > > > >>>>> Regards, Bernd > > > >>>>> > > > >>>>> > > > >>>>> Am 27.11.18 um 15:47 schrieb Vadim Ivanov: > > > >>>>>> Hi, Bernd > > > >>>>>> I have tried REBALANCELEADERS with Solr 6.3 and 7.5 > > > >>>>>> I had very similar results and notion that it's not reliable :( > > > >>>>>> -- > > > >>>>>> Br, Vadim > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de] > > > >>>>>>> Sent: Tuesday, November 27, 2018 5:13 PM > > > >>>>>>> To: solr-user@lucene.apache.org > > > >>>>>>> Subject: REBALANCELEADERS is not reliable > > > >>>>>>> > > > >>>>>>> Hi list, > > > >>>>>>> > > > >>>>>>> unfortunately REBALANCELEADERS is not reliable and the leader > > > >>>>>>> election has unpredictable results with SolrCloud 6.6.5 and > > > >>>>>>> Zookeeper 3.4.10. > > > >>>>>>> Seen with 5 shards / 3 replicas. > > > >>>>>>> > > > >>>>>>> - CLUSTERSTATUS reports all replicas (core_nodes) as state=active. > > > >>>>>>> - setting with ADDREPLICAPROP the property preferredLeader to > > > other > > > >>>>> replicas > > > >>>>>>> - calling REBALANCELEADERS > > > >>>>>>> - some leaders have changed, some not. > > > >>>>>>> > > > >>>>>>> I then tried: > > > >>>>>>> - removing all preferredLeader properties from replicas which > > > >>>>> succeeded. > > > >>>>>>> - trying again REBALANCELEADERS for the rest. No success. > > > >>>>>>> - Shutting down nodes to force the leader to a specific replica left > > > >>>>> running. > > > >>>>>>> No success. > > > >>>>>>> - calling REBALANCELEADERS responds that the replica is inactive!!! > > > >>>>>>> - calling CLUSTERSTATUS reports that the replica is active!!! > > > >>>>>>> > > > >>>>>>> Also, the replica which don't want to become leader is not in the > > > >>> list > > > >>>>>>> of collections->[collection_name]->leader_elect->shard1..x- > > >election > > > >>>>>>> > > > >>>>>>> Where is CLUSTERSTATUS getting it's state info from? > > > >>>>>>> > > > >>>>>>> Has anyone else problems with REBALANCELEADERS? > > > >>>>>>> > > > >>>>>>> I noticed that the Reference Guide writes "preferredLeader" (with > > > >>>>> capital "L") > > > >>>>>>> but the JAVA code has "preferredleader". > > > >>>>>>> > > > >>>>>>> Regards, Bernd > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > > >