lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stoppelman <stop...@gmail.com>
Subject Re: Lucene index sizes and performance
Date Thu, 16 Apr 2009 17:37:00 GMT
On Sat, Jul 7, 2007 at 8:19 PM, Chun Wei Ho <cwho.work@gmail.com> wrote:

> We are currently running a search service with a single Lucene index
> of about 10 GB. We would like to find out:
>
> (a) What is the usual index size of everyone else? How large have
> Lucene index gone in prodution environments, and is there a sort of a
> optimal size that Lucene indexes should be?
>

Same here. I'm interested in this answer too... If you're serving a lot of
traffic and need to highlight docs you need to keep
everything in memory. That's my lesson to share with the world.


>
> (b) With a index size of 10GB, how much memory would you recommend a
> dual 3GHz machine serving searches on it to have. We currently have
> 4GB RAM and are thinking of adding more for faster searches?
>
> Is there a ballpark figure or guide that we can adhere to - so we
> might add more RAM depending on the rate of index growth.
>
>
> (c) We're considered the possibility of splitting our large index into
> several smaller ones based on discussions in previous threads.
>
> Did anyone do so here and how did you manage it - splitting by logical
> category, or splitting by time (so perhaps a index that holds 2 months
> worth of documents might be split into 8 indexes of 1 week each). How
> would the searching application handle/merge results from different
> indexes?
>

Yes, this works. The most important thing if you're serving lots of traffic
is to have a master indexing box (that doesn't allow reading) and distribute
those copies to the slaves that are readonly. Also, when you copy your
indices out to slaves and the memory on them isn't twice as big as your
index you can use rsync w/ the fadvise patch (
http://insights.oetiker.ch/linux/fadvise.html) so the current index isn't
evicted from the disk cache (in linux).

I'm not sure about the combining of results, maybe this will help
http://sujitpal.blogspot.com/2007/08/remote-lucene-indexes.html


>
>
> Regards,
> CW
>
> Just a postscript here to thank mailing list folks who have been
> providing us with guidance on Lucene all this time :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message