lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Connor" <>
Subject Re: estimating memory needed for solr instances...
Date Thu, 10 Jul 2008 01:42:30 GMT
I would guess so also to a point. After you run out of RAM, indexing
also takes a hit. I have noticed on a 2Gb machine when the index gets
over 2Gb, my indexing rate when down from 100/s to 40/s. After
reaching 4Gb it was down to 10/s. I am trying now with a 8Gb machine
to see how far I get through my data before slowing down.

On Wed, Jul 9, 2008 at 7:56 PM, Jacob Singh <> wrote:
> My total guess is that indexing is CPU bound, and searching is RAM bound.
> Best,
> Jacob
> Ian Connor wrote:
>> There was a thread a while ago, that suggested just need to factor in
>> the index's total size (Mike Klaas I think was the author). It was
>> suggested having the RAM is enough and the OS will cache the files as
>> needed to give you the performance boost needed.
>> If I misread the thread, please chime in - but it seems having enough
>> RAM is the key to performance.
>> On Wed, Jul 9, 2008 at 3:00 AM, Preetam Rao <> wrote:
>>> Hi,
>>> Since we plan to share the same box among multiple solr instances on a 16gb
>>> RAM multi core box, Need to estimate how much memory we need for our
>>> application.
>>> The index size is on disk  2.4G with close to 3 million documents. The plan
>>> is to use dismax query with some fqs.
>>> Since we do not sort the results, the sort will be by score which eliminates
>>> the option "userFiterFprSortedQuerries".
>>> Thus assuming all q's will use query result cache and all fqs will use
>>> filter caches the below is what i am thinking.
>>> I would like to know how to relate the index size on disk to its memory size
>>> ?
>>> Would it be safe to assume gven the disk size of 2.4g, that we can have ram
>>> size for whole index plus 1g for any other overhead plus the cache size
>>> which comes to 150MB  (calculation below). Thus making it around 4g.
>>> cache size calculation -
>>> --------------------------------
>>> query result cache - size = 50K;
>>> since we paginate the results and each page has 10 items and assuming each
>>> user will at the max see 3 pages, per query
>>> we will set queryResultWindowSize to 30. Assuming this, for 50k querries we
>>> will use up 50000* 30 bits = 187K asuming results are stored in bitset.
>>> we use few common fqs, lets say 200. Assuming each returns around 30k
>>> documents, it adds to 200 * 30000 bits  = 750K.
>>> If we use document cache of size 20K, assuming each document size is around
>>> 5k at the max, it will take up 20000 * 5= 100MB.
>>> Thus we can increase the cache more drastically and still it will use up
>>> only 150MB or less.
>>> Is this reasoning on cache's correct ?
>>> Thanks
>>> Preetam

View raw message