Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AE57D200BDB for ; Mon, 12 Dec 2016 22:18:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id ACF92160B22; Mon, 12 Dec 2016 21:18:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CA8F8160B1A for ; Mon, 12 Dec 2016 22:18:47 +0100 (CET) Received: (qmail 42568 invoked by uid 500); 12 Dec 2016 21:18:46 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 42547 invoked by uid 99); 12 Dec 2016 21:18:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2016 21:18:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CBB5B18036E for ; Mon, 12 Dec 2016 21:18:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.398 X-Spam-Level: X-Spam-Status: No, score=0.398 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id UR5BupX4pj1F for ; Mon, 12 Dec 2016 21:18:43 +0000 (UTC) Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5F25A5FAEE for ; Mon, 12 Dec 2016 21:18:43 +0000 (UTC) Received: by mail-io0-f170.google.com with SMTP id h30so193853758iod.2 for ; Mon, 12 Dec 2016 13:18:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=kbheIWtRblubYTj8heEhdqn/+8avUA1U9qgUTH6XHcY=; b=SRn99ls3Ck3Hy+lsSsKE2IAv39E8671JME81k5RvzKG0Hc77Stii5q+XJ6OYMgILal 2xatm5599l6lYplR+9lU/mRK4JLswcfyfcPgtLKTNTi+M2fiUAKKOrYqa1oFeKtOWe+k t2yK4Jcv4Q7dnT64eSLV62SgvW2i3D3t8y79+aq9AfLOt3zidHlfxIIsP5Z2VdUAmffs bHXtHtpJkqokUBdiIfw588cp3remRF4DKL6Bct3ExtL6+kAo7HcmNdp2Ap4W5WcRGrn5 qSHoXj0vHe0j8obX2U39rWwkrFt6FAoufcP9klDooU2NVC/bq8oAs4hTpH9efukRQs8E XsrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=kbheIWtRblubYTj8heEhdqn/+8avUA1U9qgUTH6XHcY=; b=XJC0UJqwZjQJE56PHpJYvLxUMKGGASInPQ41y2zZu1qaGuz0BowF9BCO8IhigLIr1o zCc6plqvQYivs4t0/t01BVaDfY4p82wwIiBtk/H4fMeAZu6aBEI9FgunP783u4QUzl+o QUfkOPRxpx+Q2KJYj8hukMd74wmcDEkLXpdD1/+0hapq4AuVXPs/3l5EY9pXeI/1lhKn vejHeqZ3Dx0FN5CV+8H5M5XtYPXsiBs8pvImZIKcdxXvOnLDNkG7/xQi1rY/EUJ7C1re W5Xg/QD7sryV9hqzDeikEnDJKaFUEChe7pt8MDyBDg3PDtoqUGa9zdAm4kWWlABYvJgO AoCg== X-Gm-Message-State: AKaTC00j3cCZRRsgDDLxtlrlt1BYrGuT97m6cIRyJjiB0jHdN2fHBb+I6IDX6JPiEjT8RJERB1+lx+eTmD3svg== X-Received: by 10.107.174.88 with SMTP id x85mr77760969ioe.125.1481577518640; Mon, 12 Dec 2016 13:18:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.26.132 with HTTP; Mon, 12 Dec 2016 13:17:58 -0800 (PST) In-Reply-To: References: <89babc4a-ba7f-968e-450e-2da4c8aa64d7@newsrx.com> From: Erick Erickson Date: Mon, 12 Dec 2016 13:17:58 -0800 Message-ID: Subject: Re: How to check optimized or disk free status via solrj for a particular collection? To: solr-user Content-Type: text/plain; charset=UTF-8 archived-at: Mon, 12 Dec 2016 21:18:48 -0000 bq: We are indexing with autocommit at 30 minutes OK, check the size of your tlogs. What this means is that all the updates accumulate for 30 minutes in a single tlog. That tlog will be closed when autocommit happens and a new one opened for the next 30 minutes. The first tlog won't be purged until the second one is closed. All this is detailed in the link I provided. If the tlogs are significant in size this may be the entire problem. Best, Erick On Mon, Dec 12, 2016 at 12:46 PM, Susheel Kumar wrote: > One option: > > First you may purge all documents before full-reindex that you don't need > to run optimize unless you need the data to serve queries same time. > > i think you are running into out of space because your 43 million may be > consuming 30% of total disk space and when you re-index the total disk > space usage goes to 60%. Now if you run optimize, it may require double > another 60% disk space making to 120% which causes out of disk space. > > The other option is to increase disk space if you want to run optimize at > the end. > > > On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner wrote: > >> We are having an issue with running out of space when trying to do a full >> re-index. >> >> We are indexing with autocommit at 30 minutes. >> >> We have it set to only optimize at the end of an indexing cycle. >> >> >> >> On 12/12/2016 02:43 PM, Erick Erickson wrote: >> >>> First off, optimize is actually rarely necessary. I wouldn't bother >>> unless you have measurements to prove that it's desirable. >>> >>> I would _certainly_ not call optimize every 10M docs. If you must call >>> it at all call it exactly once when indexing is complete. But see >>> above. >>> >>> As far as the commit, I'd just set the autocommit settings in >>> solrconfig.xml to something "reasonable" and forget it. I usually use >>> time rather than doc count as it's a little more predictable. I often >>> use 60 seconds, but it can be longer. The longer it is, the bigger >>> your tlog will grow and if Solr shuts down forcefully the longer >>> replaying may take. Here's the whole writeup on this topic: >>> >>> https://lucidworks.com/blog/2013/08/23/understanding-transac >>> tion-logs-softcommit-and-commit-in-sorlcloud/ >>> >>> Running out of space during indexing with about 30% utilization is >>> very odd. My guess is that you're trying to take too much control. >>> Having multiple optimizations going on at once would be a very good >>> way to run out of disk space. >>> >>> And I'm assuming one replica's index per disk or you're reporting >>> aggregate index size per disk when you sah 30%. Having three replicas >>> on the same disk each consuming 30% is A Bad Thing. >>> >>> Best, >>> Erick >>> >>> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner >>> wrote: >>> >>>> Halp! >>>> >>>> I need to reindex over 43 millions documents, when optimized the >>>> collection >>>> is currently < 30% of disk space, we tried it over this weekend and it >>>> ran >>>> out of space during the reindexing. >>>> >>>> I'm thinking for the best solution for what we are trying to do is to >>>> call >>>> commit/optimize every 10,000,000 documents or so and then wait for the >>>> optimize to complete. >>>> >>>> How to check optimized status via solrj for a particular collection? >>>> >>>> Also, is there is a way to check free space per shard by collection? >>>> >>>> -Mike >>>> >>>> >>