lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dino Korah" <>
Subject RE: Multiple Indices vs Single Index
Date Fri, 21 Sep 2007 10:49:29 GMT
In a similar scenario, if we had more than 40 of such grouping, on which the
hits (not lucene hits, but hit for search by users) are not evenly
distributed, would it be good to have a LRU caching mechanism for searcher

Is there any sort of grey are that should be avoided in doing that?

-----Original Message-----
From: Nikhil Chhaochharia [] 
Sent: 21 September 2007 06:37
Subject: Re: Multiple Indices vs Single Index

Thanks Grant and Chris for the replies.

I am looking at a single index because the 40 index system has started
having performance issues at high load.  My daily traffic is increasing at a
steady pace and about 40% of the traffic is concentrated in a 2 hour period
and searches start slowing down just a little bit.

All my indices have exactly the same fields, so I guess fieldcache and
sorting are not an issue.  I do not do any updates, so I just open 40
searchers and store them in a HashMap.  When a query comes, I select the
appropriate searcher and fire the query.  So code maintainance also is not
an issue.

The different IDF statistics are a valid point.  I have not analysed the
difference in the results returned - I sort of assumed the results will be
same in both cases.  I will look at the difference in results.

There is one more point.  Sooner or later, I will have to move to multiple
servers due to increasing traffic.  (I will be handling 30,000+ hits per
hour by the end of the year)  One option is to have full index on all
servers and load-balance them.  Another is to have half the indices on one
server and half of them on the other.  The front-end (separate server) will
then fire the query on the appropriate server.  Any suggestions on which one
would be a better choice ?  All data on all servers will give me redundancy,
the system will be up even if one server goes down.  Also, adding more
servers would be trivial.


----- Original Message ----
From: Chris Hostetter <>
Sent: Friday, 21 September, 2007 4:02:38 AM
Subject: Re: Multiple Indices vs Single Index

: I was wondering if it will be better to just have 1 large index with all 
: the 40 indices combined.  I do not need to do dual-queries and my total 
: index size (if I create a single index) is about 3.4GB.  It will 
: increase to maximum of 5-6 GB.  I am running this on a dedicated machine 
: with 8GB RAM.

off the top of my head, there are 3 main reasons i can think of that would 
motivate one choice over another -- ultimately it's up to you...

1) FieldCache and sorting ... if all 40 sets of of Documents contain 
have consistently named fields, then there won't be much difference 
between 40 indexes and 1 index ... but if each of those 40 sets contain 
documents with radically differnet fields -- and you want to sort on N 
differnet fields for each sets -- then the total FieldCache sizes for each 
of those 40 indexes will be smaller then the FieldCaches for one gian 
index (because every document will get an entry wethe it makes sense or 

2) idf statistics.  if you have common fields you search regardless of 
document set, the 40 index approach will maintain seperate sttistics -- 
this may be important if some terms are very common in only som docsets.  
the word "albino" may be really common in docset A but only one doc in 
docset B has it ... in the 40 index appraoch querying B for (albino 
elephant) will give a lot of weight to albino because it's so rare, but in 
the single index case albino may not be considered as significant because 
of ht unified idf value for all docsets 9even if hte query is constrained 
to docset B) ... again: this only matters if the fields overlap, if every 
docset has a unique set of fields then the idfs will be unique because 
they are by field)

3) management: it's probably a lot simpler to maintain and manage code 
that deals with one index then code that deals with 40 indexes. you milage 
may vary.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message