From solr-user-return-140852-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Apr 25 22:37:54 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 13780180676 for ; Wed, 25 Apr 2018 22:37:53 +0200 (CEST) Received: (qmail 26009 invoked by uid 500); 25 Apr 2018 20:37:52 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 25993 invoked by uid 99); 25 Apr 2018 20:37:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2018 20:37:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E626F180112 for ; Wed, 25 Apr 2018 20:37:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ecwNwgBqAmIL for ; Wed, 25 Apr 2018 20:37:49 +0000 (UTC) Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1E4255F522 for ; Wed, 25 Apr 2018 20:37:49 +0000 (UTC) Received: by mail-lf0-f52.google.com with SMTP id r125-v6so27401947lfe.2 for ; Wed, 25 Apr 2018 13:37:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=4mai2jYEAKXcWsYqRzRrHPevW3DsDEqvxVModhiMqTs=; b=Gkp9aV1S7A29l59MBKID7W/GqdJvjKvDJp62xFOJeJCTWzAIjHrquhdTXuY2ldI43/ tkXZOhjjIXoP9XRJymxmdckvsqN8KxXeVCabNboddNxqLRjj3Y+24pBFH60lZWxTYPKv GuGvCIg1X3Ww/jOoVlgXLHfaPW7TmgIjohLanAeGgLSKllVwt4mZ1irmYN/1qKmkxWZS CgDCtpEmx5TqNzMVIV3tTqyfVmYbqReQADZD+3CVFR1tTyvAkZMy5zw+TmXsqlHtt25t 2n0sYljeUigCYat05yJZGuiDKZaIX4wwskyrW0Lp3zkTlEwhK8q16fbs6UKeasxfNK2J fn5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=4mai2jYEAKXcWsYqRzRrHPevW3DsDEqvxVModhiMqTs=; b=R8CaFitIt/6qEnIvD2Do2QYvH+ib4ehAcTgdztkdv7CCu9psntHcjxOxSSi4gFdhKm CCe8ZQouQRwy5vXdzR7S3oCaqRfethXnipUaclWsc9wgdr8snedhDyIbYYXrFkiPhXTM GCJl1yV9vdqBuDhuIL3vJJTvFifH9DsQhS8/vloQX8OME19d6PLwWAYEPFIuE3ptvNau bGw75VH0YsWBVzrMm3o0lMK7Yafo8BEWr6JDs6Snq4EDyTzzoqM1zvevC4Bk5Fe5Gaw6 h27kFUCgPNRcx39fwXqhNOSGmLYohfBbEF7rZli1sn7cYnPgKVTqy16GlJ9SyHa4w9H2 FPqw== X-Gm-Message-State: ALQs6tCBFxbp6p09OBwuANg0PiFc+5FZpZHCMSd0r3XwEI/viZNxXYZF NgNybtRPzupJEVe12emFnNMIZ1MNwFcKNAhPBkIDsg== X-Google-Smtp-Source: AB8JxZqawsH9z7qe0gIPHkU3ya36Sp3/8dMfu9LEjsRuBngG/v1IMRqXT0NKpE14hIzzHW6/sNM6tqMAmtgqFdOwH60= X-Received: by 10.46.157.136 with SMTP id c8mr14619577ljj.85.1524688668105; Wed, 25 Apr 2018 13:37:48 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a19:1369:0:0:0:0:0 with HTTP; Wed, 25 Apr 2018 13:37:07 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Wed, 25 Apr 2018 13:37:07 -0700 Message-ID: Subject: Re: Preventing solr cache flush when committing To: solr-user Content-Type: text/plain; charset="UTF-8" Had this typed up yesterday and forgot to send. "Is there no way to ensure that the top level filter caches are not expunged when some documents are added to the index and have the changes available at the same time?" no. And it's not something that you can do without major architectural changes. When you commit, background merging kicks in which will renumber the _internal_ Lucene document ID. This ID ranges 0-maxDoc and is used as the bit to set in the filterCache object. So if you preserved the filterCache, the bits will be wrong. The queryResultCache is "If that is the case, then do I need to always have to rely on warmup of caches to get some documents in caches?" Yes, that's exactly what the "autowarm" feature is on the caches. Also the newSearcher event can be used to hand-craft warmup searches where you know certain things about the index and you specifically want to ensure certain warming. Please start out with modest numbers for autowarm, as in 20-30. It's very often the case that you don't need much more than that. What those numbers do in filterCache and queryResultCache is re-execute the associated fq or q clause, respectively. "Are there any other approaches then warmup which folks usually do to avoid this; if they want to build a fast searchable product and having some write throughput as well?" and " I can't afford to get my cached flushed". What evidence do you have for this last statement? "Currently I do commits via my indexing application (after every batch of documents)" Please, please, please do _not_ do this. It's especially egregious because you do it after every batch of docs. So rather than flushing your caches every 5 minutes (say), you hammer Solr with commit after commit after commit. Configure your soft commit interval to your latency requirements and forget about it. Or just configure hard commit with openSearcher set to true. Or perhaps even just specify commitWithin when you send docs to Solr. At a guess you may have seen warnings about "too many on deck searchers" if your commit interval ls shorter than your autowarm time. I'll bend a little bit if the client only issues a commit at the very end of the run and there's precisely one client running at a time and you can _guarantee_ there's only one commit, but it's usually much easier and more reliable to use the solr config settings. Perhaps you're not entirely familiar with how openSearcher works, so here's a brief review. This applies to either hard commit (openSearcher=true) or soft commit. 1> a commit happens 2> a new searcher is being opened and autowarming kicks off 3> incoming searches are served by the _old_ searcher, using all the _old_ caches. 4> autowarming completes 5a> incoming requests are routed to the new searcher 5b> the old searcher finishes serving the outstanding requests received before <4> and closes 6> the old caches are flushed. So having high read throughput On Tue, Apr 24, 2018 at 10:36 AM, Lee Carroll wrote: > From memory try the following: > Don't manually commit from client after batch indexing > set soft commit to be a a long time interval. As long as acceptable to run > stale, say 5 mins or longer if you can. > set hard commit to be short (seconds ) to keep everything neat and tidy > regards updates and avoid backing up log files > set opensearcher=false > > I'm pretty sure that works for at least one of our indices. It's worth a go. > > Lee C > > On 24 April 2018 at 06:56, Papa Pappu wrote: > >> Hi, >> I've written down my query over stack-overflow. Here is the link for that : >> https://stackoverflow.com/questions/49993681/preventing- >> solr-cache-flush-when-commiting >> >> In short, I am facing troubles maintaining my solr caches when commits >> happen and the question provides detailed description of the same. >> >> Based on my use-case if someone can recommend what settings I should use or >> practices I should follow it'll be really helpful. >> >> Thanks and regards, >> Dmitri >>