Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <CAJuqOK+EfYg1yr+M1gyNGO=woMe63GGOSYvc_iM85g1wX+6diw@mail.gmail.com>
References: <c473c435-ece9-cd80-4085-b6119baa8f47@newsrx.com>
 <CAN4YXvewBxK17VdUQVmaFs3FSxaxfZKCuJd+aqHA-+acNCvBYA@mail.gmail.com>
 <89babc4a-ba7f-968e-450e-2da4c8aa64d7@newsrx.com> <CAJuqOK+EfYg1yr+M1gyNGO=woMe63GGOSYvc_iM85g1wX+6diw@mail.gmail.com>
From: Erick Erickson <erickerickson@gmail.com>
Date: Mon, 12 Dec 2016 13:17:58 -0800
Message-ID: <CAN4YXvekzuiXFHaZwej_sGB6FALN_V+uM=umgPyw9YqM7nMUUQ@mail.gmail.com>
Subject: Re: How to check optimized or disk free status via solrj for a
 particular collection?
To: solr-user <solr-user@lucene.apache.org>
Content-Type: text/plain; charset=UTF-8
archived-at: Mon, 12 Dec 2016 21:18:48 -0000

bq: We are indexing with autocommit at 30 minutes

OK, check the size of your tlogs. What this means is that all the
updates accumulate for 30 minutes in a single tlog. That tlog will be
closed when autocommit happens and a new one opened for the
next 30 minutes. The first tlog won't be purged until the second one
is closed. All this is detailed in the link I provided.

If the tlogs are significant in size this may be the entire problem.

Best,
Erick

On Mon, Dec 12, 2016 at 12:46 PM, Susheel Kumar <susheel2777@gmail.com> wrote:
> One option:
>
> First you may purge all documents before full-reindex that you don't need
> to run optimize unless you need the data to serve queries same time.
>
> i think you are running into out of space because your 43 million may be
> consuming 30% of total disk space and when you re-index the total disk
> space usage goes to 60%.  Now if you run optimize, it may require double
> another 60% disk space making to 120% which causes out of disk space.
>
> The other option is to increase disk space if you want to run optimize at
> the end.
>
>
> On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner <michael@newsrx.com> wrote:
>
>> We are having an issue with running out of space when trying to do a full
>> re-index.
>>
>> We are indexing with autocommit at 30 minutes.
>>
>> We have it set to only optimize at the end of an indexing cycle.
>>
>>
>>
>> On 12/12/2016 02:43 PM, Erick Erickson wrote:
>>
>>> First off, optimize is actually rarely necessary. I wouldn't bother
>>> unless you have measurements to prove that it's desirable.
>>>
>>> I would _certainly_ not call optimize every 10M docs. If you must call
>>> it at all call it exactly once when indexing is complete. But see
>>> above.
>>>
>>> As far as the commit, I'd just set the autocommit settings in
>>> solrconfig.xml to something "reasonable" and forget it. I usually use
>>> time rather than doc count as it's a little more predictable. I often
>>> use 60 seconds, but it can be longer. The longer it is, the bigger
>>> your tlog will grow and if Solr shuts down forcefully the longer
>>> replaying may take. Here's the whole writeup on this topic:
>>>
>>> https://lucidworks.com/blog/2013/08/23/understanding-transac
>>> tion-logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Running out of space during indexing with about 30% utilization is
>>> very odd. My guess is that you're trying to take too much control.
>>> Having multiple optimizations going on at once would be a very good
>>> way to run out of disk space.
>>>
>>> And I'm assuming one replica's index per disk or you're reporting
>>> aggregate index size per disk when you sah 30%. Having three replicas
>>> on the same disk each consuming 30% is A Bad Thing.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <michael@newsrx.com>
>>> wrote:
>>>
>>>> Halp!
>>>>
>>>> I need to reindex over 43 millions documents, when optimized the
>>>> collection
>>>> is currently < 30% of disk space, we tried it over this weekend and it
>>>> ran
>>>> out of space during the reindexing.
>>>>
>>>> I'm thinking for the best solution for what we are trying to do is to
>>>> call
>>>> commit/optimize every 10,000,000 documents or so and then wait for the
>>>> optimize to complete.
>>>>
>>>> How to check optimized status via solrj for a particular collection?
>>>>
>>>> Also, is there is a way to check free space per shard by collection?
>>>>
>>>> -Mike
>>>>
>>>>
>>