lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Easwaran <rishi.easwa...@aol.com>
Subject Re: Multiple index.timestamp directories using up disk space
Date Mon, 04 May 2015 14:21:24 GMT
Walter,

Unless I am missing something here.. I completely get that, when a few segment merges solr
requires 2x space of segments to accomplish this.
Usually any index has multiple segments files so this fragmented 2x space consumption is not
an issue, even as merged segments grow bigger. 

But what I am talking about is copy of a whole index as is into a new directory.  The new
directory has no relation to the older index directory or its segments, so not sure what merges
are going on across directories/indexes, and why solr needs the older index.

Thanks,
Rishi.

 

 

 

-----Original Message-----
From: Walter Underwood <wunder@wunderwood.org>
To: solr-user <solr-user@lucene.apache.org>
Sent: Mon, May 4, 2015 9:50 am
Subject: Re: Multiple index.timestamp directories using up disk space


One segment is in-use, being searched. That segment (and others) are merged into
a new segment. After the new segment is ready, searches are directed to the new
copy and the old copies are deleted.

That is how two copies are needed.

If
you cannot provide 2X the disk space, you will not have a stable Solr
installation. You should consider a different search engine.

“Optimizing”
(forced merges) will not help. It will probably cause failures more often
because it always merges the larges segment.

Walter
Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my
blog)


On May 4, 2015, at 3:53 AM, Rishi Easwaran <rishi.easwaran@aol.com>
wrote:

> Thanks for the responses Mark and Ramkumar.
> 
> The question I
had was, why does Solr need 2 copies at any given time, leading to 2x disk space
usage. 
> Not sure if this information is not published anywhere, and makes HW
estimation almost impossible for large scale deployment. Even if the copies are
temporary, this becomes really expensive, especially when using SSD in
production, when the complex size is over 400TB indexes, running 1000's of solr
cloud shards. 
> 
> If a solr follower has decided that it needs to do
replication from leader and capture full copy snapshot. Why can't it delete the
old information and replicate from scratch, not requiring more disk space.
> Is
the concern data loss (a case when both leader and follower lose data)?.
> 
>
Thanks,
> Rishi.   
> 
> 
> 
> 
> 
> 
> 
> -----Original
Message-----
> From: Mark Miller <markrmiller@gmail.com>
> To: solr-user
<solr-user@lucene.apache.org>
> Sent: Tue, Apr 28, 2015 10:52 am
> Subject:
Re: Multiple index.timestamp directories using up disk space
> 
> 
> If
copies of the index are not eventually cleaned up, I'd fill a JIRA
> to
>
address the issue. Those directories should be removed over time. At
> times
>
there will have to be a couple around at the same time and others may
> take
>
a while to clean up.
> 
> - Mark
> 
> On Tue, Apr 28, 2015 at 3:27 AM
Ramkumar
> R. Aiyengar <
> andyetitmoves@gmail.com> wrote:
> 
>> SolrCloud
does need up to
> twice the amount of disk space as your usual
>> index size
during replication.
> Amongst other things, this ensures you have
>> a full
copy of the index at any
> point. There's no way around this, I would
>>
suggest you provision the
> additional disk space needed.
>> On 20 Apr 2015
23:21, "Rishi Easwaran"
> <rishi.easwaran@aol.com> wrote:
>> 
>>> Hi
All,
>>> 
>>> We are seeing this
> problem with solr 4.6 and solr
4.10.3.
>>> For some reason, solr cloud tries to
> recover and creates a new
index
>>> directory - (ex:index.20150420181214550),
> while keeping the older
index
>> as
>>> is. This creates an issues where the
> disk space fills up
and the shard
>>> never ends up recovering.
>>> Usually
> this requires a
manual intervention of  bouncing the instance and
>>> wiping
> the disk clean
to allow for a clean recovery.
>>> 
>>> Any ideas on how to
> prevent solr
from creating multiple copies of index
>>> directory.
>>> 
>>> 
>
Thanks,
>>> Rishi.
>>> 
>> 
> 
> 


 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message