Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F5E0102CF for ; Wed, 4 Jun 2014 03:33:04 +0000 (UTC) Received: (qmail 11588 invoked by uid 500); 4 Jun 2014 03:33:00 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 11515 invoked by uid 500); 4 Jun 2014 03:33:00 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 11504 invoked by uid 99); 4 Jun 2014 03:33:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 03:33:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of greg.pendlebury@gmail.com designates 209.85.212.178 as permitted sender) Received: from [209.85.212.178] (HELO mail-wi0-f178.google.com) (209.85.212.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 03:32:56 +0000 Received: by mail-wi0-f178.google.com with SMTP id cc10so605490wib.17 for ; Tue, 03 Jun 2014 20:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=kd1Z9HhNowENCHy1bM+vIukB+8TJlzIMjmT8azetImE=; b=QVT2ld3GnAnoZp6roXwg2pjy1dVIcBK00VMz5OD4xNS/f5o7zshbtRg9Q3xkozhyb8 aPTqmfdujy/iMWjZv+5hIrGYGKar3ZAzFH8G8c/yH5skf+OOL0LOMJamWCKeLcC59g5V 04Nmfy07DIHgbxSdaeCVU9M3IQGBMncmzcI0jF0HrqaB4xNP6HEV8xK7tC/3BCIAP6O4 RxtI9oUM713BdR26qrFGQmQBwXCyU7AiLlnbA+6SVmUmSgfu/rL0hBOMYcHOyckKLI+D wiJoGYi5tQphGTyZPOKK2W0aOrM+g8GU5cPjwCBGff2dza2XrkdB0thKWiVFPBWCDuJE 3VXg== MIME-Version: 1.0 X-Received: by 10.194.1.164 with SMTP id 4mr65210686wjn.17.1401852753165; Tue, 03 Jun 2014 20:32:33 -0700 (PDT) Received: by 10.194.205.193 with HTTP; Tue, 3 Jun 2014 20:32:33 -0700 (PDT) Date: Wed, 4 Jun 2014 13:32:33 +1000 Message-ID: Subject: SolrCloud leaders using more disk space From: Greg Pendlebury To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b3a817661476e04fafa47e9 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a817661476e04fafa47e9 Content-Type: text/plain; charset=UTF-8 Hi all, We launched our new production instance of SolrCloud last week and since then have noticed a trend with regards to disk usage. The non-leader replicas all seem to be self-optimizing their index segments as expected, but the leaders have (on average) around 33% more data on disk. My assumption is that leader's are not self-optimising (or not to the same extent)... but it is still early days of course. If it helps, there are 45 JVMs in the cloud, with 15 shards and 3 replicas per shard. Each non-leader shard is sitting at between 59GB and 87GB on their SSD, but the leaders are between 84GB and 116GB. We have pretty much constant read and write traffic 24x7, with just 'slow' periods overnight when write traffic is < 1 document per second and searches are between 1 and 2 per second. Is this light level of traffic still too much for the leaders to self-optimise? I'd also be curious to hear about what others are doing in terms of operating procedures. We load test before launch what would happen if we turned off JVMs and forced recovery events. I know that these things all work, just that customers will experience slower search responses whilst they occur. For example, a restore from a leader to a replica under load testing for us takes around 30 minutes and response times drop from around 200-300ms average to 1.5s average. Bottleneck appears to be network I/O on the servers. We haven't explored whether this is specific to the servers replicating, or saturation of the of the infrastructure that all the servers share, because... This performance is acceptable for us, but I'm not sure if I'd like to force that event to occur unless required... this is following the line of reasoning proposed internally that we should periodically rotate leaders by turning them off briefly. We aren't going to do that unless we have a strong reason though. Does anyone try to manipulate production instances that way? Vaguely related to this is leader distribution. We have 9 physical servers and 5 JVMs running on each server. By virtue of the deployment procedures the first 3 servers to come online are all running 5 leaders each. Is there any merit in 'moving' these around (by reboots)? Our planning up to launch was based on lots of mailing list response we'd seen that indicated leaders had no significant performance difference to normal replicas, and all of our testing has agreed with that. The disk size 'issue' (which we aren't worried about... yet. It hasn't been in prod long enough to know for certain) may be the only thing we've seen so far. Ta, Greg --047d7b3a817661476e04fafa47e9--