lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Date Thu, 11 Apr 2013 15:11:59 GMT
Segments are on a per-field basis... so doesn't it depend on how many fields 
are merged in parallel? I mean, when most people say "index size" they are 
referring to all fields, collectively, not individual fields. I'm just 
wondering how number of processor cores might affect things (more cores 
might make the worst case scenario worse since it will maximize the amount 
of data processed at a given moment.)

But, I suppose in the final analysis, it may all average out. It may not be 
exactly the worst case, but maybe close enough.

And all of this depends on which merge policy you choose. With the default 
tiered merge policy things shouldn't be so bad as the 3x worst case.

-- Jack Krupansky

-----Original Message----- 
From: Walter Underwood
Sent: Thursday, April 11, 2013 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Approximately needed RAM for 5000 query/second at a Solr 
machine?

Here is the situation where merging can require 3X space. It can only happen 
if you force merge, then index with merging turned off, but we had Ultraseek 
customers do that.

* All documents are merged into a single segment.
* Without a merge, all documents are replaced.
* This results in one segment of deleted documents and one of new documents 
(2X).
* A merge takes place, creating a new segment of the same size, thus 3X.

For normal operation, 2X is plenty of room.

wunder

On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

> I've investigated this in the past. The worst case is 2*indexSize 
> additional disk space (3*indexSize total) during an optimize.
>
> In our system, we use LogByteSizeMergePolicy, and used to have a 
> mergeFactor of 10. We would see the worst case happen when there were 
> exactly 20 segments (or some other multiple of 10, I believe) at the start 
> of the optimize. IIRC, it would merge those 20 segments down to 2 
> segments, and then merge those 2 segments down to 1 segment. 1*indexSize 
> space was used by the original index (because there is still a reader open 
> on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was 
> used by the 1 segment. This is the worst case because there are two full 
> additional copies of the index on disk. Normally, when the number of 
> segments is not a multiple of the mergeFactor, there will be some part of 
> the index that was not part of both merges (and this part that is excluded 
> usually would be the largest segments).
>
> We worked around this by doing multiple optimize passes, where the first 
> pass merges down to between 2 and 2*mergeFactor-1 segments (based on a 
> great tip from Lance Norskog on the mailing list a couple years ago).
>
> I'm not sure if the current merge policy implementations still have this 
> issue.
>
> -Michael
>
> -----Original Message-----
> From: Furkan KAMACI [mailto:furkankamaci@gmail.com]
> Sent: Thursday, April 11, 2013 2:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Approximately needed RAM for 5000 query/second at a Solr 
> machine?
>
> Hi Walter;
>
> Is there any document or something else says that worst case is three 
> times of disk space? Twice times or three times. It is really different 
> when we talk about GB's of disk spaces.
>
>
> 2013/4/10 Walter Underwood <wunder@wunderwood.org>
>
>> Correct, except the worst case maximum for disk space is three times.
>> --wunder
>>
>> On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
>>
>>> You're mixing up disk and RAM requirements when you talk about
>>> having twice the disk size. Solr does _NOT_ require twice the index
>>> size of RAM to optimize, it requires twice the size on _DISK_.
>>>
>>> In terms of RAM requirements, you need to create an index, run
>>> realistic queries at the installation and measure.
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Apr 9, 2013 at 10:32 PM, bigjust <bigjust@lambdaphil.es> wrote:
>>>>
>>>>
>>>>
>>>>>> On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
>>>>>>> These are really good metrics for me:
>>>>>>> You say that RAM size should be at least index size, and it is
>>>>>>> better to have a RAM size twice the index size (because of worst
>>>>>>> case scenario).
>>>>>>> On the other hand let's assume that I have a RAM size that is
>>>>>>> bigger than twice of indexes at machine. Can Solr use that extra
>>>>>>> RAM or is it a approximately maximum limit (to have twice size
>>>>>>> of indexes at machine)?
>>>>>> What we have been discussing is the OS cache, which is memory
>>>>>> that is not used by programs.  The OS uses that memory to make
>>>>>> everything run faster.  The OS will instantly give that memory up
>>>>>> if a program requests it.
>>>>>> Solr is a java program, and java uses memory a little
>>>>>> differently, so Solr most likely will NOT use more memory when it
is 
>>>>>> available.
>>>>>> In a "normal" directly executable program, memory can be
>>>>>> allocated at any time, and given back to the system at any time.
>>>>>> With Java, you tell it the maximum amount of memory the program
>>>>>> is ever allowed to use.  Because of how memory is used inside
>>>>>> Java, most long-running Java programs (like Solr) will allocate
>>>>>> up to the configured maximum even if they don't really need that
much 
>>>>>> memory.
>>>>>> Most Java virtual machines will never give the memory back to the
>>>>>> system even if it is not required.
>>>>>> Thanks, Shawn
>>>>>>
>>>>>>
>>>> Furkan KAMACI <furkankamaci@gmail.com> writes:
>>>>
>>>>> I am sorry but you said:
>>>>>
>>>>> *you need enough free RAM for the OS to cache the maximum amount
>>>>> of disk space all your indexes will ever use*
>>>>>
>>>>> I have made an assumption my indexes at my machine. Let's assume
>>>>> that it is 5 GB. So it is better to have at least 5 GB RAM? OK,
>>>>> Solr will use RAM up to how much I define it as a Java processes.
>>>>> When we think about the indexes at storage and caching them at RAM
>>>>> by OS, is that what you talk about: having more than 5 GB - or -
>>>>> 10 GB RAM for my machine?
>>>>>
>>>>> 2013/4/10 Shawn Heisey <solr@elyograg.org>
>>>>>
>>>>
>>>> 10 GB.  Because when Solr shuffles the data around, it could use up
>>>> to twice the size of the index in order to optimize the index on disk.
>>>>
>>>> -- Justin
>>
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>>

--
Walter Underwood
wunder@wunderwood.org




Mime
View raw message