lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to check optimized or disk free status via solrj for a particular collection?
Date Mon, 12 Dec 2016 21:17:58 GMT
bq: We are indexing with autocommit at 30 minutes

OK, check the size of your tlogs. What this means is that all the
updates accumulate for 30 minutes in a single tlog. That tlog will be
closed when autocommit happens and a new one opened for the
next 30 minutes. The first tlog won't be purged until the second one
is closed. All this is detailed in the link I provided.

If the tlogs are significant in size this may be the entire problem.

Best,
Erick

On Mon, Dec 12, 2016 at 12:46 PM, Susheel Kumar <susheel2777@gmail.com> wrote:
> One option:
>
> First you may purge all documents before full-reindex that you don't need
> to run optimize unless you need the data to serve queries same time.
>
> i think you are running into out of space because your 43 million may be
> consuming 30% of total disk space and when you re-index the total disk
> space usage goes to 60%.  Now if you run optimize, it may require double
> another 60% disk space making to 120% which causes out of disk space.
>
> The other option is to increase disk space if you want to run optimize at
> the end.
>
>
> On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner <michael@newsrx.com> wrote:
>
>> We are having an issue with running out of space when trying to do a full
>> re-index.
>>
>> We are indexing with autocommit at 30 minutes.
>>
>> We have it set to only optimize at the end of an indexing cycle.
>>
>>
>>
>> On 12/12/2016 02:43 PM, Erick Erickson wrote:
>>
>>> First off, optimize is actually rarely necessary. I wouldn't bother
>>> unless you have measurements to prove that it's desirable.
>>>
>>> I would _certainly_ not call optimize every 10M docs. If you must call
>>> it at all call it exactly once when indexing is complete. But see
>>> above.
>>>
>>> As far as the commit, I'd just set the autocommit settings in
>>> solrconfig.xml to something "reasonable" and forget it. I usually use
>>> time rather than doc count as it's a little more predictable. I often
>>> use 60 seconds, but it can be longer. The longer it is, the bigger
>>> your tlog will grow and if Solr shuts down forcefully the longer
>>> replaying may take. Here's the whole writeup on this topic:
>>>
>>> https://lucidworks.com/blog/2013/08/23/understanding-transac
>>> tion-logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Running out of space during indexing with about 30% utilization is
>>> very odd. My guess is that you're trying to take too much control.
>>> Having multiple optimizations going on at once would be a very good
>>> way to run out of disk space.
>>>
>>> And I'm assuming one replica's index per disk or you're reporting
>>> aggregate index size per disk when you sah 30%. Having three replicas
>>> on the same disk each consuming 30% is A Bad Thing.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <michael@newsrx.com>
>>> wrote:
>>>
>>>> Halp!
>>>>
>>>> I need to reindex over 43 millions documents, when optimized the
>>>> collection
>>>> is currently < 30% of disk space, we tried it over this weekend and it
>>>> ran
>>>> out of space during the reindexing.
>>>>
>>>> I'm thinking for the best solution for what we are trying to do is to
>>>> call
>>>> commit/optimize every 10,000,000 documents or so and then wait for the
>>>> optimize to complete.
>>>>
>>>> How to check optimized status via solrj for a particular collection?
>>>>
>>>> Also, is there is a way to check free space per shard by collection?
>>>>
>>>> -Mike
>>>>
>>>>
>>

Mime
View raw message