lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Conrad <ccon...@vasoftware.com>
Subject Optimal index structure
Date Wed, 26 Jan 2005 00:38:09 GMT
I'm currently working on building a search function for my application 
and am looking for guidance on what the optimal way to store the index 
would be.

The application has several different document types with documents 
split into different categories.  Each category has differing numbers 
of documents of each type.  A small category may have as few as 0 to 5 
documents of each type, a large category might have as many 50,000+ 
documents of each type.  There are upwards of 100,000 categories.  The 
search function would never have to search documents from more than one 
category at a time, but should be able to search either a single 
document type or multiple document types together.  I need to be able 
to handle over 1,000,000 searches a day with as many as 50 simultaneous 
searches at peak times.

My current thinking is that each category would get it's own index.  
Each document type would have a keyword which indicates which document 
type it is.  When doing a search, I can either add a filter for that 
particular document type, or if the search is over all document types I 
can leave the filter out.  Alternately, I could put everything in 1 
very large index and choose category and document type by filters.  Or 
I can have an index for each document type for each category and use 
multi-index searchers when necessary.

I'm afraid that the description above is quite convoluted, so let me 
know if further clarification is necessary.

Any advice is welcome.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message