From solr-user-return-139021-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Feb 12 19:44:42 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 424D5180652 for ; Mon, 12 Feb 2018 19:44:42 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3229A160C3F; Mon, 12 Feb 2018 18:44:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 50AB5160C30 for ; Mon, 12 Feb 2018 19:44:41 +0100 (CET) Received: (qmail 19504 invoked by uid 500); 12 Feb 2018 18:44:39 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 19481 invoked by uid 99); 12 Feb 2018 18:44:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Feb 2018 18:44:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 221CBC18DD for ; Mon, 12 Feb 2018 18:44:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.211 X-Spam-Level: *** X-Spam-Status: No, score=3.211 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=sial.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NexrhWlGXShF for ; Mon, 12 Feb 2018 18:44:34 +0000 (UTC) Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com [209.85.213.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2F8DC5F21E for ; Mon, 12 Feb 2018 18:44:34 +0000 (UTC) Received: by mail-vk0-f48.google.com with SMTP id j204so9387491vke.12 for ; Mon, 12 Feb 2018 10:44:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sial.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=s95OAcnsKXuAqkbphOLmH1Rtp2cS/xdqgwp2P6nwjCA=; b=udJwcnFvHVanFesV2MTRd7oL5pTHefUJnIs/DedhsxLvdmsPQg/bfT/Z9wJSFFPlLl kYd5Q4pOxrFLz3MgkO4rlQjIWl4APfcbPvZY3VCC5MN3O2H8YqsDZdVZDpi2snQolTAE CQSnTMRS19agJgFnlxrmPoVoEuvxjminFNJfUST5V9KYWlGRruSf5ftiFUhAR+eREgkZ 4tPHO9Y841UA94bPNRMmVus8EtNozqwHiZgaZyyu6NoqGGf7UqRSH3hqFS3NOOQ4qmIX qYjCzMjOag0R26G5MUfH4Asb6ml2pObVOBdkQDGDMyuYEjyboRkvZ00P2kyfv6rgarNE xPBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=s95OAcnsKXuAqkbphOLmH1Rtp2cS/xdqgwp2P6nwjCA=; b=gj7cELiejNhOMhYehPoVmj0q+sj2Rp4JToCZFYLnaNwWFV7iiLzPVyWgCX3SVAt+Eb YVjb5ADOsEOwQOw8pZbZHEgLBWjzw7CLfuem78zsfp0RX2lrtEx2X0C81F61F+Hc7m7z Lw6woE7wzqd5k4+93wmfwP9TB7oB8cyklls991KM4p6R+7FbR27lRRO8rd10LdFLfM2/ 2pGqWuBmeYlbdVWs/eZozqVbcdNoU3RCi37x9kLOjQeozAmrwBTE/+gsp8Ls1UPvkY6R WCAYNDy3f3aL+YvBk8saJPOtA6tPO0KETxVKXgZivqLfW4xzfRUBwOmav5NKwslmVpSX VXGg== X-Gm-Message-State: APf1xPBS7lof58bgK9PxN2kxx21w5NkctjeqqwjlvDmqKYZy4nKF2+JD ZZ5J7AtiKceF1+K8eWnFdQF62qdKgNuQakSZk6NwhdQpHH3LN58C3ClhKVhdW3LMRGkXbD9I+OC qcdb5kGwuW4ch0ID+F5mV863y/XO6JKgOrJk= X-Google-Smtp-Source: AH8x227LTsd+IauPfuEqB8/8FflmELk+UrLCXf+6+DY2F5TKOQNcQ0FhKPZ47ds5rJ9UjRtLUOYa0upQ5oMjHMggknE= X-Received: by 10.31.212.135 with SMTP id l129mr11940718vkg.31.1518461073338; Mon, 12 Feb 2018 10:44:33 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.53.215 with HTTP; Mon, 12 Feb 2018 10:44:32 -0800 (PST) In-Reply-To: <19e280e0-7735-1b6e-527f-9105167efdd6@elyograg.org> References: <4d148be1-7a0b-f26e-ef31-7491fe908414@elyograg.org> <19e280e0-7735-1b6e-527f-9105167efdd6@elyograg.org> From: Webster Homer Date: Mon, 12 Feb 2018 12:44:32 -0600 Message-ID: Subject: Re: solrcloud Auto-commit doesn't seem reliable To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary="001a114eb460e207ea0565084400" --001a114eb460e207ea0565084400 Content-Type: text/plain; charset="UTF-8" Erick, I am aware of the CDCR buffering problem causing tlog retention, we always turn buffering off in our cdcr configurations. My post was precipitated by seeing that we had uncommitted data in collections > 24 hours after it was loaded. The collections I was looking at are in our development environment, where we do not use CDCR. However I'm pretty sure that I've seen situations in production where commits were also long overdue. the "autoSoftcommit" was a typo. The soft commit logic seems to be fine, I don't see an issue with data visibility. But if 3 seconds is aggressive what would be a good value for soft commit? We have a couple of collections that are updated every minute although most of them are updated much less frequently. My reason for raising this commit issue is that we see problems with the relevancy of solrcloud searches, and the NRT replica type. Sometimes the results flip where the best hit varies by what replica serviced the search. This is hard to explain to management. Doing an optimized does address the problem for a while. I try to avoid optimizing for the reasons you and Sean list. If a commit doesn't happen how would there ever be an index merge that would remove the deleted documents. The problem with deletes and relevancy don't seem to occur when we use TLOG replicas, probably because they don't do their own indexing but get copies from their leader. We are testing them now eventually we may abandon the use of NRT replicas for most of our collections. I am quite concerned about this commit issue. What kinds of things would influence whether a commit occurs? One commonality for our systems is that they are hosted in a Google cloud. We have a number of collections that share configurations, but others that do not. I think commits do happen, but I don't trust that autoCommit is reliable. What can we do to make it reliable? Most of our collections are reindexed weekly with partial updates applied daily, that at least is what happens in production, our development clouds are not as regular. Our solr startup script sets the following values: -Dsolr.autoCommit.maxDocs=35000 -Dsolr.autoCommit.maxTime=60000 -Dsolr.autoSoftCommit.maxTime=3000 I don't think we reference solr.autoCommit.maxDocs in our solrconfig.xml files. here are our settings for autoCommit and autoSoftCommit We had a lot of issues with missing commits when we didn't set solr.autoCommit.maxTime ${solr.autoCommit.maxTime:60000} false ${solr.autoSoftCommit.maxTime:5000} On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey wrote: > On 2/9/2018 9:29 AM, Webster Homer wrote: > >> A little more background. Our production Solrclouds are populated via >> CDCR, >> CDCR does not replicate commits, Commits to the target clouds happen via >> autoCommit settings >> >> We see relvancy scores get inconsistent when there are too many deletes >> which seems to happen when hard commits don't happen. >> >> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer >> wrote: >> >> I we do have autoSoftcommit set to 3 seconds. It is NOT the visibility of >>> the records that is my primary concern. I am concerned about is the >>> accumulation of uncommitted tlog files and the larger number of deleted >>> documents. >>> >> > For the deleted documents: Have you ever done an optimize on the > collection? If so, you're going to need to re-do the optimize regularly to > keep deleted documents from growing out of control. See this issue for a > very technical discussion about it: > > https://issues.apache.org/jira/browse/LUCENE-7976 > > Deleted documents probably aren't really related to what we've been > discussing. That shouldn't really be strongly affected by commit settings. > > ----- > > A 3 second autoSoftCommit is VERY aggressive. If your soft commits are > taking longer than 3 seconds to complete, which is often what happens, then > that will lead to problems. I wouldn't expect it to cause the kinds of > problems you describe, though. It would manifest as Solr working too hard, > logging warnings or errors, and changes taking too long to show up. > > Assuming that the config for autoSoftCommit doesn't have the typo that > Erick mentioned. > > ---- > > I have never used CDCR, so I know very little about it. But I have seen > reports on this mailing list saying that transaction logs never get deleted > when CDCR is configured. > > Below is a link to a mailing list discussion related to CDCR not deleting > transaction logs. Looks like for it to work right a buffer needs to be > disabled, and there may also be problems caused by not having a complete > zkHost string in the CDCR config: > > http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with- > the-transaction-log-files-td4345062.html > > Erick also mentioned this. > > Thanks, > Shawn > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. --001a114eb460e207ea0565084400--