Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1A1F1200D31 for ; Sat, 4 Nov 2017 10:17:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 18444160BE7; Sat, 4 Nov 2017 09:17:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5E6C7160BD5 for ; Sat, 4 Nov 2017 10:17:47 +0100 (CET) Received: (qmail 46344 invoked by uid 500); 4 Nov 2017 09:17:45 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 46328 invoked by uid 99); 4 Nov 2017 09:17:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Nov 2017 09:17:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 39484106F4A for ; Sat, 4 Nov 2017 09:17:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.8 X-Spam-Level: X-Spam-Status: No, score=-2.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=sematext-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id iXO3JQABkgrm for ; Sat, 4 Nov 2017 09:17:43 +0000 (UTC) Received: from mail-wr0-f177.google.com (mail-wr0-f177.google.com [209.85.128.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 829FB5FE5C for ; Sat, 4 Nov 2017 09:17:42 +0000 (UTC) Received: by mail-wr0-f177.google.com with SMTP id 4so179057wrt.0 for ; Sat, 04 Nov 2017 02:17:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sematext-com.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=BNwTSDQXfs/KhPzY0veVXMlTeXB1klTZtB3nNGACKas=; b=e7U48vvnJakbSrtYJthlEUTYISa95BcTUjUuhLjeV4DMfVfxtCw3AIZ8VL0XMkxqwh 8eHRdz/4bWc2gwz1uZvksdJhk4tykPa/Pcd6yDLQ1q65L8XSBK00U7MSPo7kMKLbcozu bKbBTA3PPNpI0eMK/qZyJrRSmhNS/XT9KAcuv7q8HjpgtNnNpO/7ADXm2NxJU12RbUBX qKsANmw+TdSY5iUkBb3mjsBor6Ecrgd9uarZt5ayy1Ws4ehc1hnuOZQVmxLTzB6bsQPE ak1ovbKyqxgh+WaYpVZvc3z2skTCCA83ABpyxHRknzcmZlOqEQgfltETQ33so9X5UEIn YJKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=BNwTSDQXfs/KhPzY0veVXMlTeXB1klTZtB3nNGACKas=; b=i+Z0lb/9GC0/hYjY7MHRgoFUx4PR93XQMiOTD8E5Qo4txaifIEyeQz0PEfuF1cKjiJ RkCncI2oTz64MXtS/zG4B7jK9HZl/soonsFGFKC/W52FJCY6jNfuPNvOjBwjwrCtqAxu GaVIGKhuD0lcMa6tpTsYRrwhtQ4irlwvUy7cRMi2HDOY2mO+oY8Gj5xUjTo0M0QZenKv tmu/S4kf5+35WC+Wy4IVEHpLwpo9H5AhibUiWhMi3C0XIVSnQDh/lzVWIfrXIAUHDRWI tUgRFtWHxcFW6zS0jE+g1KIjLAT9pc9oiExozVDDnQT4VLj1rTFApgBgE8Ln7IwuU7oG wh7g== X-Gm-Message-State: AMCzsaVOK31f+8fyWUTGUSVzFbsg9mebL6l+xnI7V+ts2apMyjBp77qa 0lGvwE7kk28JUZqqTosPZ77yJ4SD8Ig= X-Google-Smtp-Source: ABhQp+RkkfkZlLBZl5h753wgOhaT2LEZPIRHFG405aQa/0fCvQ3pdIm8OO2vG4Etf4KE4GAnbxnllQ== X-Received: by 10.223.151.51 with SMTP id r48mr8036519wrb.164.1509787061745; Sat, 04 Nov 2017 02:17:41 -0700 (PDT) Received: from [192.168.0.102] (cable-77-77-234-249.dynamic.telemach.ba. [77.77.234.249]) by smtp.gmail.com with ESMTPSA id f84sm4783131wmh.47.2017.11.04.02.17.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Nov 2017 02:17:41 -0700 (PDT) From: =?utf-8?Q?Emir_Arnautovi=C4=87?= Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: SolrClould 6.6 stability challenges Date: Sat, 4 Nov 2017 10:17:37 +0100 References: To: solr-user@lucene.apache.org In-Reply-To: Message-Id: <21229602-2772-4C34-84A6-0F81600FCE56@sematext.com> X-Mailer: Apple Mail (2.3273) archived-at: Sat, 04 Nov 2017 09:17:48 -0000 Hi Rick, Do you see any errors in logs? Do you have any monitoring tool? Maybe = you can check heap and GC metrics around time when incident happened. It = is not large heap but some major GC could cause pause large enough to = trigger some snowball and end up with node in recovery state. What is indexing rate you observe? Why do you have max warming searchers = 5 (did you mean this with autowarmingsearchers?) when you commit every 5 = min? Why did you increase it - you seen errors with default 2? Maybe you = commit every bulk? Do you see similar behaviour when you just do indexing without queries? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 4 Nov 2017, at 05:15, Rick Dig wrote: >=20 > hello all, > we are trying to run solrcloud 6.6 in a production setting. > here's our config and issue > 1) 3 nodes, 1 shard, replication factor 3 > 2) all nodes are 16GB RAM, 4 core > 3) Our production load is about 2000 requests per minute > 4) index is fairly small, index size is around 400 MB with 300k = documents > 5) autocommit is currently set to 5 minutes (even though ideally we = would > like a smaller interval). > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc. > 7) all of this runs perfectly ok when indexing isn't happening. as = soon as > we start "nrt" indexing one of the follower nodes goes down within 10 = to 20 > minutes. from this point on the nodes never recover unless we stop > indexing. the master usually is the last one to fall. > 8) there are maybe 5 to 7 processes indexing at the same time with = document > batch sizes of 500. > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5, > 10) no cpu and / or oom issues that we can see. > 11) cpu load does go fairly high 15 to 20 at times. > any help or pointers appreciated >=20 > thanks > rick