lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Preetam Rao" <blogathan....@gmail.com>
Subject Re: estimating memory needed for solr instances...
Date Thu, 10 Jul 2008 04:39:55 GMT
Thanks for the responses, Ian, Jacob.

While I could not locate the previous thread, this is what I understand..

While we can fine tune the cache parameters and other stuff which we can
directly control, with respect to index files the key is to give enough RAM
and let the the OS do its best with respect to keeping the index file in
memory,

----------
Preetam

On Thu, Jul 10, 2008 at 7:12 AM, Ian Connor <ian.connor@gmail.com> wrote:

> I would guess so also to a point. After you run out of RAM, indexing
> also takes a hit. I have noticed on a 2Gb machine when the index gets
> over 2Gb, my indexing rate when down from 100/s to 40/s. After
> reaching 4Gb it was down to 10/s. I am trying now with a 8Gb machine
> to see how far I get through my data before slowing down.
>
> On Wed, Jul 9, 2008 at 7:56 PM, Jacob Singh <jacobsingh@gmail.com> wrote:
> > My total guess is that indexing is CPU bound, and searching is RAM bound.
> >
> > Best,
> > Jacob
> > Ian Connor wrote:
> >> There was a thread a while ago, that suggested just need to factor in
> >> the index's total size (Mike Klaas I think was the author). It was
> >> suggested having the RAM is enough and the OS will cache the files as
> >> needed to give you the performance boost needed.
> >>
> >> If I misread the thread, please chime in - but it seems having enough
> >> RAM is the key to performance.
> >>
> >> On Wed, Jul 9, 2008 at 3:00 AM, Preetam Rao <blogathan.rao@gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> Since we plan to share the same box among multiple solr instances on a
> 16gb
> >>> RAM multi core box, Need to estimate how much memory we need for our
> >>> application.
> >>>
> >>> The index size is on disk  2.4G with close to 3 million documents. The
> plan
> >>> is to use dismax query with some fqs.
> >>> Since we do not sort the results, the sort will be by score which
> eliminates
> >>> the option "userFiterFprSortedQuerries".
> >>> Thus assuming all q's will use query result cache and all fqs will use
> >>> filter caches the below is what i am thinking.
> >>>
> >>> I would like to know how to relate the index size on disk to its memory
> size
> >>> ?
> >>> Would it be safe to assume gven the disk size of 2.4g, that we can have
> ram
> >>> size for whole index plus 1g for any other overhead plus the cache size
> >>> which comes to 150MB  (calculation below). Thus making it around 4g.
> >>>
> >>> cache size calculation -
> >>> --------------------------------
> >>> query result cache - size = 50K;
> >>> since we paginate the results and each page has 10 items and assuming
> each
> >>> user will at the max see 3 pages, per query
> >>> we will set queryResultWindowSize to 30. Assuming this, for 50k
> querries we
> >>> will use up 50000* 30 bits = 187K asuming results are stored in bitset.
> >>>
> >>> we use few common fqs, lets say 200. Assuming each returns around 30k
> >>> documents, it adds to 200 * 30000 bits  = 750K.
> >>>
> >>> If we use document cache of size 20K, assuming each document size is
> around
> >>> 5k at the max, it will take up 20000 * 5= 100MB.
> >>>
> >>> Thus we can increase the cache more drastically and still it will use
> up
> >>> only 150MB or less.
> >>>
> >>> Is this reasoning on cache's correct ?
> >>>
> >>> Thanks
> >>> Preetam
> >>>
> >>
> >>
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message