Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FD4117F12 for ; Sat, 9 Jan 2016 23:45:07 +0000 (UTC) Received: (qmail 45330 invoked by uid 500); 9 Jan 2016 23:45:04 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 45255 invoked by uid 500); 9 Jan 2016 23:45:04 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 45242 invoked by uid 99); 9 Jan 2016 23:45:04 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2016 23:45:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E0ED4C2872 for ; Sat, 9 Jan 2016 23:45:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.899 X-Spam-Level: ** X-Spam-Status: No, score=2.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_TIME=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id nrC0puK8qpqh for ; Sat, 9 Jan 2016 23:44:57 +0000 (UTC) Received: from mail-io0-f174.google.com (mail-io0-f174.google.com [209.85.223.174]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 5543C20103 for ; Sat, 9 Jan 2016 23:44:57 +0000 (UTC) Received: by mail-io0-f174.google.com with SMTP id g73so121748894ioe.3 for ; Sat, 09 Jan 2016 15:44:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Zxvq2gPhMFNW+evXchSFeoxJRnjtf9LjnJz+A/NW2yk=; b=aIDYblmzypG7Cz1qzei//84hy8Aa9xQupLK8OIIANAZsToKup+wLQoZMjy3lLNwERi OOUp1elpcqOXT60nU6/u4Wh25vbAemv8VMZ9YatZpxLCtTCeGvHfACjGIZEPxSqSRQaz 3ysaZUaOj8fYtCkl4S6PzdSlNja4+1EtfcUjmdQkd02mPJBMM8/WjqYqKwhC0CK/ASVW cZlfl0D95Rtqil0/til9KS2qeZ7dsdD7SW/DjF3yVRCYlCE/OX37JvVj4zxtAhMB+d45 iPSwa8e+5BLsCmR6lRmbgfuKdyFcOh8WK45p/1qczR3+SB08oqbex1yF8Fc7gRieY+gS Y/bw== MIME-Version: 1.0 X-Received: by 10.107.30.144 with SMTP id e138mr18308677ioe.158.1452383090100; Sat, 09 Jan 2016 15:44:50 -0800 (PST) Received: by 10.107.53.197 with HTTP; Sat, 9 Jan 2016 15:44:50 -0800 (PST) In-Reply-To: <569167B2.6050901@intelcompute.com> References: <569167B2.6050901@intelcompute.com> Date: Sat, 9 Jan 2016 15:44:50 -0800 Message-ID: Subject: Re: Querying only replica's From: Erick Erickson To: solr-user Content-Type: text/plain; charset=UTF-8 bq: is it best/good to get the CLUSTERSTATUS via the collection API and explicitly send queries to a replica to ensure I don't send queries to the leaders of my collection In a word _no_. SolrCloud is vastly different than the old master/slave. In SolrCloud, each and every node (leader and replicas) index all the docs and serve queries. The additional burden the leader has is actually very small. There's absolutely no reason to _not_ use the leader to serve queries. As far as sending updates, there would be a _little_ benefit to sending the updates directly to the leader, but _far_ more benefit in using SolrJ. If you use SolrJ (and CloudSolrClient), then the documents are split up on the _client_ and only the docs for a particular shard are automatically sent to the leader for that shard. Using SolrJ you can essentially scale indexing linearly with the number of shards you have. Just using HTTP does not scale linearly. Your particular app may not care, but in high-throughput situations this can be significant. So rather than spend time and effort sending updates directly to a leader and have the leader then forward the docs to the correct shard, I recommend investing the time in using SolrJ for updates rather than sending updates to the leader over HTTP. Or just ignore the problem and devote your efforts to something that are more valuable. So in short: 1> just stick a load balancer in front of _all_ your Solr nodes for queries. And note that there's an internal load balancer already in Solr that routes things around anyway, although putting a load balancer in front of your entire cluster makes it so there's not a single point of failure. 2> Depending on your throughput needs, either 2a> use SolrJ to index 2b> don't worry about it and send updates through the load balancer as well. There'll be an extra hop if you send updates to a replica, but if that's significant you should be using SolrJ As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was just released in early December. There's usually a several month lag between point releases and there's some agitation to start the 6.0 release process, so it's up in the air. On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown wrote: > Hi, > > (btw, when is 5.5 due? I see the docs reference it, but not the download > page) > > Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it best/good > to get the CLUSTERSTATUS via the collection API and explicitly send queries > to a replica to ensure I don't send queries to the leaders of my collection, > to improve performance? Like-wise with sending updates directly to a > Leader? > > My leaders will receive full updates of the entire collection once a day, so > I would assume if the leader is handling queries too, performance would be > hit? > > Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, etc.? I > wasn't sure if ZooKeeper would be able to tell me also. > > Do I also need to do anything to ensure the leaders are never sent queries > from the replica's? > > Does this all sound sane? > > One of my collections is 3 shards, with 2 replica's each (9 total nodes), > 70m docs in total. > > Thanks, > Rob >